I think others will have more thoughts on this, esp. for Numeric* questions... 
but I'll try answering...
 

----- Original Message ----
> From: Tomislav Poljak <tpol...@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Fri, May 7, 2010 2:34:46 PM
> Subject: Filter vs. TermQuery performance
> 
> Hi,
> when is it wise to replace a TermQuery with cached Filter 
> (regarding search performance). If TermQuery is used only to filter results 
> based on field value (it doesn't participate in scoring), is it alway wise 
> to replace it with filter? 

Yes, assuming the filter will be reused.  I think there is not a lot of value 
in using a filter (vs. just a regular query) if that filter will not be reused. 
 This is why in Solr "fq"s (filtered queries) are cached in a special filter 
cache.  I *think* the only other benefit of using a filter query vs., say, 
TermQuery, is that the former will not spend any time/CPU on computing the 
score for the filter part.

> Is it only wise if Filter is cached (wrapped in CachingWrapperFilter) and 
> reused often?

I think so.  See above.

> Does it matter how many 
> distinct values field has (which is related to how many matches/results for 
> one given/selected value is returned and also with how many times same filter 
> instance is reused)?

I *think* it matters.  I think the more docs a filter matches, the higher the 
benefit from reusing a filter.

> For example, what if filter for single value matches 
> only 5% of docs, should filter be used or is it better to use TermQuery? 
> What about if filter for single value matches 20%? or 50% or 
> 75%

I'm not sure...

> I have a question regarding caching performance/memory usage. 
> Documents have date&time indexed (as NumericField) with minute resolution 
> and there are few thousands unique date&time in index. On the search 
> side open ended range filter is used (NumericRangeFilter) with current 
> time as a parameter.

> Now, is it wise to cache NumericRangeFilter here 
> (reuse instance of CachingWrapperFilter wrapping NumericRangeFilter) since it 
> will not be reused often (only from users searching at same time in same time 
> zone)?

If the cache hit rate is low, why waste memory on caching is what I would think 
is the logic to apply here.
If you have 3 queries, and each uses a different date range query, then you 
will not see benefits from caching..
If 2 of those 3 queries use the exact same date range query, then you will see 
caching benefits.

> Is it better to use NumericRangeFilter or NumericRangeQuery in this case?

I'm not sure, but I'd be happy to add specific advice to Javadoc when the 
answer is clear.

Otis

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to