Hi,

 

be aware that FieldValueFilter uses FieldCache (and may possibly use DocValues, 
if indexed with that – I am not sure for this case), so it might be slower on 
the first run. In any case, as this is a BitSet filter, its best if executed 
with another query that drives the iteration. Otherwise it is plain stupid 
incrementing document numbers until a match is found.

 

In theory, TermRangeQuery should return the same results, but maybe you have 
some issues with deleted documents? Another thing might be that the wirldcard 
does not match all your fields, e.g. maybe because it’s the empty string? In 
theory it should match, it would just be something to look into. Maybe there is 
a real bug. Which version of Lucene?

 

Is the number returned by FieldValueFilter identical to TermRange/Wildcard? Or 
is it correct with respect to your other approach?

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: [email protected]

 

From: Tommaso Teofili [mailto:[email protected]] 
Sent: Friday, October 31, 2014 1:21 PM
To: [email protected]
Subject: Re: "field exists" queries and benchmarks

 

thanks Uwe!

 

Performances do not seem much different (WildcardQuery seem to dominate), are 
there any specific docValue settings to make that work the best?

 

One more question, does anyone know why TermRangeQuery (somefield:[* TO *]) and 
WildcardQuery (somefield:*) do not return the exact number of docs having that 
field? See my test output for a field all 100k documents have (with a random 
value):

   [junit4]   1> changing:[* TO *]

   [junit4]   1> 99526 hits

   [junit4]   1> changing:*

   [junit4]   1> 99526 hits

   [junit4]   1> fields:changing

   [junit4]   1> 100000 hits

 

Regards,

Tommaso

 

 

2014-10-30 17:46 GMT+01:00 Uwe Schindler <[email protected]>:

Hi,

 

there are already a Filter available (that optimizes this special case):

http://lucene.apache.org/core/4_10_1/core/org/apache/lucene/search/FieldValueFilter.html

 

To make a query out of it use ConstantScoreQuery. But this filter is better 
used as real filter, because it has a bitset behind.

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: [email protected]

 

From: Tommaso Teofili [mailto:[email protected]] 
Sent: Thursday, October 30, 2014 5:34 PM
To: [email protected]
Subject: "field exists" queries and benchmarks

 

Hi all,

 

I'm doing some (rough) tests / benchmarks in order to understand what's the 
best way of doing a "field exists" query.

 

As far as I could find we can use TermRangeQuery (somefield:[* TO *]), 
WildcardQuery (somefield:*) or a plain TermQuery on another field where the 
doc's fieldnames have been indexed (fields:somfield).

 

Besides some other suggestion on how to accomplish that (very much welcome), 
I'd like to understand what is the expected performance of each of the above 
approaches because in my case the TermRangeQuery seems to be the less 
performant while the other 2 are on average on the same level.

 

One strange thing is that with TermRangeQuery and WildcardQuery the hitcount is 
not fully correct, I meaning that with 100k docs I get the correct hit count 
only with the TermQuery approach.

Code and sample outputs can be found at [1].

Any hint would be appreciated.

 

Regards,

Tommaso

 

[1] : https://gist.github.com/tteofili/52856d938fcd465eab58

 

Reply via email to