On Thu, Oct 28, 2010 at 7:59 PM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > : Anyway, I think its possible other users might be in this same > : situation, with slow performance, and not even realizing it yet... > : Obviously they can fix this if they go and add LengthFilter, but > : should we be doing something different? > > On one level, ithink a big improvement might just be to start encouraging > more use of LengthFilter with min=1 at the end of analyzers by including > it at the end of more "example" field types -- we should probably end > every analyzer with that and RemoveDuplicatesTokenFilterFactory as a > general pattern.
why not just discard them completely in say, indexer/queryparser ? > > How individual Tokenizers and TokenFilters deal with empty tokens seems > like something that should be cases by case -- the Ngram classes should > allow/create them if the "min" value is 0, the pattern based classes > should create them if the pattern matches and empty string, etc.... why should they create them? is there some use case for the empty term that you have found (because i can't think of a use case, except making your search engine slower!) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org