On Tue, Feb 8, 2011 at 9:12 AM, David Smiley (@MITRE.org) <dsmi...@mitre.org> wrote:
> I'm skeptical that whatever the difference is is relevant in the scheme of > things. The cost to keeping it is introducing confusion on users, and more > code to maintain. > its pretty significant. charfilters are not reusable, and box every character and lookup out of a hashmap (i made a patch to fix the reusability, but no one has commented) : https://issues.apache.org/jira/browse/LUCENE-2788 asciifoldingfilter does a huge switch (which still isnt optimal), but its way way faster than mappingcharfilter, especially since its a no-op for chars < 0x7F. icufoldingfilter precompiles a recursively decomposed trie, so its lookup is a unicode folded trie (icu-project.org/docs/papers/foldedtrie_iuc21.ppt). I think its a tad slower than asciifoldingfilter but it also incorporates case folding and unicode normalization: neither asciifoldingfilter nor mappingcharfilter will not properly fold http://www.geonames.org/search.html?q=Ab%C5%AB+Z%CC%A7aby&country=, because there is no composed form for Z + combining cedilla, but icufoldingfilter will. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org