On Tue, Feb 8, 2011 at 9:12 AM, David Smiley (@MITRE.org)
<dsmi...@mitre.org> wrote:

> I'm skeptical that whatever the difference is is relevant in the scheme of
> things. The cost to keeping it is introducing confusion on users, and more
> code to maintain.
>

its pretty significant. charfilters are not reusable, and box every
character and lookup out of a hashmap (i made a patch to fix the
reusability, but no one has commented) :
https://issues.apache.org/jira/browse/LUCENE-2788

asciifoldingfilter does a huge switch (which still isnt optimal), but
its way way faster than mappingcharfilter, especially since its a
no-op for chars < 0x7F.

icufoldingfilter precompiles a recursively decomposed trie, so its
lookup is a unicode folded trie
(icu-project.org/docs/papers/foldedtrie_iuc21.ppt). I think its a tad
slower than asciifoldingfilter but it also incorporates case folding
and unicode normalization: neither asciifoldingfilter nor
mappingcharfilter will not properly fold
http://www.geonames.org/search.html?q=Ab%C5%AB+Z%CC%A7aby&country=,
because there is no composed form for Z + combining cedilla, but
icufoldingfilter will.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to