thanks Uwe! Performances do not seem much different (WildcardQuery seem to dominate), are there any specific docValue settings to make that work the best?
One more question, does anyone know why TermRangeQuery (somefield:[* TO *]) and WildcardQuery (somefield:*) do not return the exact number of docs having that field? See my test output for a field all 100k documents have (with a random value): [junit4] 1> changing:[* TO *] [junit4] 1> 99526 hits [junit4] 1> changing:* [junit4] 1> 99526 hits [junit4] 1> fields:changing [junit4] 1> 100000 hits Regards, Tommaso 2014-10-30 17:46 GMT+01:00 Uwe Schindler <[email protected]>: > Hi, > > > > there are already a Filter available (that optimizes this special case): > > > http://lucene.apache.org/core/4_10_1/core/org/apache/lucene/search/FieldValueFilter.html > > > > To make a query out of it use ConstantScoreQuery. But this filter is > better used as real filter, because it has a bitset behind. > > > > Uwe > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: [email protected] > > > > *From:* Tommaso Teofili [mailto:[email protected]] > *Sent:* Thursday, October 30, 2014 5:34 PM > *To:* [email protected] > *Subject:* "field exists" queries and benchmarks > > > > Hi all, > > > > I'm doing some (rough) tests / benchmarks in order to understand what's > the best way of doing a "field exists" query. > > > > As far as I could find we can use TermRangeQuery (somefield:[* TO *]), > WildcardQuery (somefield:*) or a plain TermQuery on another field where the > doc's fieldnames have been indexed (fields:somfield). > > > > Besides some other suggestion on how to accomplish that (very much > welcome), I'd like to understand what is the expected performance of each > of the above approaches because in my case the TermRangeQuery seems to be > the less performant while the other 2 are on average on the same level. > > > > One strange thing is that with TermRangeQuery and WildcardQuery the > hitcount is not fully correct, I meaning that with 100k docs I get the > correct hit count only with the TermQuery approach. > > Code and sample outputs can be found at [1]. > > Any hint would be appreciated. > > > > Regards, > > Tommaso > > > > [1] : https://gist.github.com/tteofili/52856d938fcd465eab58 >
