Hi,
be aware that FieldValueFilter uses FieldCache (and may possibly use DocValues, if indexed with that – I am not sure for this case), so it might be slower on the first run. In any case, as this is a BitSet filter, its best if executed with another query that drives the iteration. Otherwise it is plain stupid incrementing document numbers until a match is found. In theory, TermRangeQuery should return the same results, but maybe you have some issues with deleted documents? Another thing might be that the wirldcard does not match all your fields, e.g. maybe because it’s the empty string? In theory it should match, it would just be something to look into. Maybe there is a real bug. Which version of Lucene? Is the number returned by FieldValueFilter identical to TermRange/Wildcard? Or is it correct with respect to your other approach? Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen <http://www.thetaphi.de/> http://www.thetaphi.de eMail: [email protected] From: Tommaso Teofili [mailto:[email protected]] Sent: Friday, October 31, 2014 1:21 PM To: [email protected] Subject: Re: "field exists" queries and benchmarks thanks Uwe! Performances do not seem much different (WildcardQuery seem to dominate), are there any specific docValue settings to make that work the best? One more question, does anyone know why TermRangeQuery (somefield:[* TO *]) and WildcardQuery (somefield:*) do not return the exact number of docs having that field? See my test output for a field all 100k documents have (with a random value): [junit4] 1> changing:[* TO *] [junit4] 1> 99526 hits [junit4] 1> changing:* [junit4] 1> 99526 hits [junit4] 1> fields:changing [junit4] 1> 100000 hits Regards, Tommaso 2014-10-30 17:46 GMT+01:00 Uwe Schindler <[email protected]>: Hi, there are already a Filter available (that optimizes this special case): http://lucene.apache.org/core/4_10_1/core/org/apache/lucene/search/FieldValueFilter.html To make a query out of it use ConstantScoreQuery. But this filter is better used as real filter, because it has a bitset behind. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen <http://www.thetaphi.de/> http://www.thetaphi.de eMail: [email protected] From: Tommaso Teofili [mailto:[email protected]] Sent: Thursday, October 30, 2014 5:34 PM To: [email protected] Subject: "field exists" queries and benchmarks Hi all, I'm doing some (rough) tests / benchmarks in order to understand what's the best way of doing a "field exists" query. As far as I could find we can use TermRangeQuery (somefield:[* TO *]), WildcardQuery (somefield:*) or a plain TermQuery on another field where the doc's fieldnames have been indexed (fields:somfield). Besides some other suggestion on how to accomplish that (very much welcome), I'd like to understand what is the expected performance of each of the above approaches because in my case the TermRangeQuery seems to be the less performant while the other 2 are on average on the same level. One strange thing is that with TermRangeQuery and WildcardQuery the hitcount is not fully correct, I meaning that with 100k docs I get the correct hit count only with the TermQuery approach. Code and sample outputs can be found at [1]. Any hint would be appreciated. Regards, Tommaso [1] : https://gist.github.com/tteofili/52856d938fcd465eab58
