RE: Should ASCIIFoldingFilter be deprecated?

David Smiley (@MITRE.org) Tue, 08 Feb 2011 06:12:41 -0800


Chris Hostetter-3 wrote:
> 
> CharFilters and TokenFilters have different purposes though...
> 
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#When_To_use_a_CharFilter_vs_a_TokenFilter
> 
> (ie: If you use MappingCharFilter, you can't then tokenize on some of the 
> characters you filtered away)
>


Right, but it’s hard to imagine wanting to tokenize on an accent character
or some other modification specified in these particular mapping files.


Steven A Rowe wrote:
> 
> AFAIK, ISOLatin1AccentFilter was deprecated because ASCIIFoldingFilter
> provides a superset of it mappings.
> 

*If* that is the case then this file should also be removed:
solr/example/solr/conf/mapping-ISOLatin1Accent.txt


Steven A Rowe wrote:
> 
> I haven't done any benchmarking, but I'm pretty sure that
> ASCIIFoldingFilter can achieve a significantly higher throughput rate than
> MappingCharFilter, and given that, it probably makes sense to keep both,
> to allow people to make the choice about the tradeoff between the
> flexibility provided by the human-readable (and editable) mapping file and
> the speed provided by ASCIIFoldingFilter.
> 

I'm skeptical that whatever the difference is is relevant in the scheme of
things. The cost to keeping it is introducing confusion on users, and more
code to maintain.

~ David Smiley

-----
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Should-ASCIIFoldingFilter-be-deprecated-tp2448919p2451504.html
Sent from the Solr - Dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Should ASCIIFoldingFilter be deprecated?

Reply via email to