I think others will have more thoughts on this, esp. for Numeric* questions... but I'll try answering...
----- Original Message ---- > From: Tomislav Poljak <tpol...@gmail.com> > To: java-user@lucene.apache.org > Sent: Fri, May 7, 2010 2:34:46 PM > Subject: Filter vs. TermQuery performance > > Hi, > when is it wise to replace a TermQuery with cached Filter > (regarding search performance). If TermQuery is used only to filter results > based on field value (it doesn't participate in scoring), is it alway wise > to replace it with filter? Yes, assuming the filter will be reused. I think there is not a lot of value in using a filter (vs. just a regular query) if that filter will not be reused. This is why in Solr "fq"s (filtered queries) are cached in a special filter cache. I *think* the only other benefit of using a filter query vs., say, TermQuery, is that the former will not spend any time/CPU on computing the score for the filter part. > Is it only wise if Filter is cached (wrapped in CachingWrapperFilter) and > reused often? I think so. See above. > Does it matter how many > distinct values field has (which is related to how many matches/results for > one given/selected value is returned and also with how many times same filter > instance is reused)? I *think* it matters. I think the more docs a filter matches, the higher the benefit from reusing a filter. > For example, what if filter for single value matches > only 5% of docs, should filter be used or is it better to use TermQuery? > What about if filter for single value matches 20%? or 50% or > 75% I'm not sure... > I have a question regarding caching performance/memory usage. > Documents have date&time indexed (as NumericField) with minute resolution > and there are few thousands unique date&time in index. On the search > side open ended range filter is used (NumericRangeFilter) with current > time as a parameter. > Now, is it wise to cache NumericRangeFilter here > (reuse instance of CachingWrapperFilter wrapping NumericRangeFilter) since it > will not be reused often (only from users searching at same time in same time > zone)? If the cache hit rate is low, why waste memory on caching is what I would think is the logic to apply here. If you have 3 queries, and each uses a different date range query, then you will not see benefits from caching.. If 2 of those 3 queries use the exact same date range query, then you will see caching benefits. > Is it better to use NumericRangeFilter or NumericRangeQuery in this case? I'm not sure, but I'd be happy to add specific advice to Javadoc when the answer is clear. Otis --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org