Hello,
I am migrating a rather large application from Lucene 4.10 to Lucene 5.5.0.
Since Filters are deprecated in Lucene 5, I am looking for an efficient
replacement in our code.
We use many Filters that calculate the DocIdSet by doing a lookup of numeric
DocValues in some collection.
Everything is based on "long" types and results could be large.
Pseudo code in Filter class looks like this:
@Override
public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs)
throws IOException {
AtomicReader reader = context.reader();
OpenBitSet docSet = new OpenBitSet();
NumericDocValues docValues = reader.getNumericDocValues(filterKeyName);
for (int doc = 0; doc < reader.maxDoc(); doc++) {
long value = docValues.get(doc); // getting DocValues for current
doc
if (isMatch(value)) { // check value against some condition
docSet.set(doc); // set bit for doc
}
}
return docSet;
}
I wonder what the proper and efficient replacement for such filtering is?
Should I convert my matching value set into a TermsQuery and wrap with
ConstantScoreQuery?
I could do this, but then I am concerned about:
* Efficiency:
The matching document in the isMatch() method above could be very large. I
would need to create large collection of Terms rather than the memory efficient
DocIdSet.
* More efficiency:
>From my current understanding, I would need to create a Term from the String
>representation of my long value. Isn't this inefficient again?
I would really appreciate any recommendations on this.
Thanks a lot and best regards,
Josef