Hi,

Great, thanks a lot. 

Pointing out to RandomAccessWeight and the approach used in 
DocValuesNumbersQuery is exactly what I need for my use case.
I created my own query type that takes advantage of already loaded LongBitSet 
values. It allows efficiently implementing the Bits that match a document 
inside my own RandomAccessWeight implementation.

This approach is efficient when number of values exceeds a certain threshold. 
Below that threshold, using TermsQuery is more efficient. 
I can decide in my code which approach is actually more efficient by applying 
my specific heuristic.

Overall, for larger values map (above 20,000 entries), I decreased search time 
to about 10-30% of what I needed before. For smaller value maps, search time 
stay efficient due to usage of TermsQuery.

Thanks again!

Josef

-----Original Message-----
From: Trejkaz [mailto:trej...@trypticon.org] 
Sent: Wednesday, June 27, 2018 4:51 AM
To: Lucene Users Mailing List
Subject: Re: Efficient way to define large Boolean Occur.FILTER clause in 
Lucene 6

On Tue, Jun 26, 2018 at 7:02 PM, Hasenberger, Josef
<josef.hasenber...@zetcom.com> wrote:
> However, I have a feeling that the conversion from Long values to Terms is
> rather inefficient for large collections and also uses a lot of memory.
> To ease conversion overhead somewhat, I created a class that converts a
> Long value directly to BytesRef instance (in order to avoid conversion to
> UTF16 and then UTF8 again) and pass that instance to the Term constructor.

First thought is, why are you using TermsQuery if they're in DocValues?
Is DocValuesTermsQuery any better? It does depend on how many terms
you're searching for.

Second thought is that there is also DocValuesNumbersQuery, which
avoids having to convert all the values.

> I just wonder if there is a better method for passing large amount of filter 
> criteria
> to a BooleanQuery Occur.FILTER clause, that avoids excessive object creation.

If you can get your long values into something which implements Bits,
you could make a query using RandomAccessWeight to directly point at
the existing set you already have in memory.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to