Hi,

2014-10-31 13:37 GMT+01:00 Uwe Schindler <[email protected]>:

> Hi,
>
>
>
> be aware that FieldValueFilter uses FieldCache (and may possibly use
> DocValues, if indexed with that – I am not sure for this case),
>

yes, I'm trying with (binary/sorted) docValues.


> so it might be slower on the first run. In any case, as this is a BitSet
> filter, its best if executed with another query that drives the iteration.
> Otherwise it is plain stupid incrementing document numbers until a match is
> found.
>

my use case is for a plain "field xyx exists" query, so I am just
interested in retrieving those documents having the field xyz with whatever
value (empty string included)


>
>
> In theory, TermRangeQuery should return the same results, but maybe you
> have some issues with deleted documents?
>

no, that's just a testcase where I don't have deletions.


> Another thing might be that the wirldcard does not match all your fields,
> e.g. maybe because it’s the empty string? In theory it should match, it
> would just be something to look into. Maybe there is a real bug.
>

the strange thing is that both WildcardQuery and TermRanqueQuery return the
same (wrong) hitcount.


> Which version of Lucene?
>

I'm using trunk


>
>
> Is the number returned by FieldValueFilter identical to
> TermRange/Wildcard? Or is it correct with respect to your other approach?
>

the FieldValueFilter and the TermQuery (meaning I index each doc's field
names into another field and search for fields:xyz) return the right number
(100k), while TermRangeQuery and WildcardQuery both return less hits, I
figured out it's because of empty Strings, as you said this should be
working though.

Regards,
Tommaso


>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: [email protected]
>
>
>
> *From:* Tommaso Teofili [mailto:[email protected]]
> *Sent:* Friday, October 31, 2014 1:21 PM
> *To:* [email protected]
> *Subject:* Re: "field exists" queries and benchmarks
>
>
>
> thanks Uwe!
>
>
>
> Performances do not seem much different (WildcardQuery seem to dominate),
> are there any specific docValue settings to make that work the best?
>
>
>
> One more question, does anyone know why TermRangeQuery (somefield:[* TO
> *]) and WildcardQuery (somefield:*) do not return the exact number of docs
> having that field? See my test output for a field all 100k documents have
> (with a random value):
>
>    [junit4]   1> changing:[* TO *]
>
>    [junit4]   1> 99526 hits
>
>    [junit4]   1> changing:*
>
>    [junit4]   1> 99526 hits
>
>    [junit4]   1> fields:changing
>
>    [junit4]   1> 100000 hits
>
>
>
> Regards,
>
> Tommaso
>
>
>
>
>
> 2014-10-30 17:46 GMT+01:00 Uwe Schindler <[email protected]>:
>
> Hi,
>
>
>
> there are already a Filter available (that optimizes this special case):
>
>
> http://lucene.apache.org/core/4_10_1/core/org/apache/lucene/search/FieldValueFilter.html
>
>
>
> To make a query out of it use ConstantScoreQuery. But this filter is
> better used as real filter, because it has a bitset behind.
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: [email protected]
>
>
>
> *From:* Tommaso Teofili [mailto:[email protected]]
> *Sent:* Thursday, October 30, 2014 5:34 PM
> *To:* [email protected]
> *Subject:* "field exists" queries and benchmarks
>
>
>
> Hi all,
>
>
>
> I'm doing some (rough) tests / benchmarks in order to understand what's
> the best way of doing a "field exists" query.
>
>
>
> As far as I could find we can use TermRangeQuery (somefield:[* TO *]),
> WildcardQuery (somefield:*) or a plain TermQuery on another field where the
> doc's fieldnames have been indexed (fields:somfield).
>
>
>
> Besides some other suggestion on how to accomplish that (very much
> welcome), I'd like to understand what is the expected performance of each
> of the above approaches because in my case the TermRangeQuery seems to be
> the less performant while the other 2 are on average on the same level.
>
>
>
> One strange thing is that with TermRangeQuery and WildcardQuery the
> hitcount is not fully correct, I meaning that with 100k docs I get the
> correct hit count only with the TermQuery approach.
>
> Code and sample outputs can be found at [1].
>
> Any hint would be appreciated.
>
>
>
> Regards,
>
> Tommaso
>
>
>
> [1] : https://gist.github.com/tteofili/52856d938fcd465eab58
>
>
>

Reply via email to