Re: inconsistency/performance trap of empty terms

Robert Muir Thu, 28 Oct 2010 17:21:31 -0700

On Thu, Oct 28, 2010 at 7:59 PM, Chris Hostetter
<hossman_luc...@fucit.org> wrote:
>
> : Anyway, I think its possible other users might be in this same
> : situation, with slow performance, and not even realizing it yet...
> : Obviously they can fix this if they go and add LengthFilter, but
> : should we be doing something different?
>
> On one level,  ithink a big improvement might just be to start encouraging
> more use of LengthFilter with min=1 at the end of analyzers by including
> it at the end of more "example" field types -- we should probably end
> every analyzer with that and RemoveDuplicatesTokenFilterFactory as a
> general pattern.


why not just discard them completely in say, indexer/queryparser ?

>
> How individual Tokenizers and TokenFilters deal with empty tokens seems
> like something that should be cases by case -- the Ngram classes should
> allow/create them if the "min" value is 0, the pattern based classes
> should create them if the pattern matches and empty string, etc....

why should they create them? is there some use case for the empty term
that you have found (because i can't think of a use case, except
making your search engine slower!)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: inconsistency/performance trap of empty terms

Reply via email to