Re: unexpected performance TermsQuery Occur.SHOULD vs TermsInSetQuery?

Rob Audenaerde Tue, 13 Oct 2020 02:56:39 -0700

Hello Adrien,

Thanks for the swift reply. I'll add the details:


Lucene version: 8.6.2

The restrictionQuery is indeed a conjunction, it allowes for a document to
be a hit if the 'roles' field is empty as well. It's used within a
bigger query builder; so maybe I did something else wrong. I'll rewrite the
benchmark to just benchmark the TermsInSet and Terms.

It never occurred (hah) to me to use Occur.FILTER, that is a good point to
check as well.

As you put it, I would expect the results to be very similar, as I do not
react the 16 terms in the TermInSet. I'll let you know what I'll find.

On Tue, Oct 13, 2020 at 11:48 AM Adrien Grand <jpou...@gmail.com> wrote:

> Can you give us a few more details:
>  - What version of Lucene are you testing?
>  - Are you benchmarking "restrictionQuery" on its own, or its conjunction
> with another query?
>
> You mentioned that you combine your "restrictionQuery" and the user query
> with Occur.MUST, Occur.FILTER feels more appropriate for "restrictionQuery"
> since it should not contribute to scoring.
>
> TermsInSetQuery automatically executes like a BooleanQuery when the number
> of clauses is less than 16, so I would not expect major performance
> differences between a TermInSetQuery over less than 16 terms and a
> BooleanQuery wrapped in a ConstantScoreQuery.
>
> On Tue, Oct 13, 2020 at 11:35 AM Rob Audenaerde <rob.audenae...@gmail.com>
> wrote:
>
> > Hello,
> >
> > I'm benchmarking an application which implements security on lucene by
> > adding a multivalue field "roles". If the user has one of these roles, he
> > can find the document.
> >
> > I implemented this as a Boolean and query, added the original query and
> the
> > restriction with Occur.MUST.
> >
> > I'm having some performance issues when counting the index (>60M docs),
> so
> > I thought about tweaking this restriction-implementation.
> >
> > I set-up a benchmark like this:
> >
> > I generate 2M documents, Each document has a multi-value "roles" field.
> The
> > "roles" field in each document has 4 values, taken from (2,2,1000,100)
> > unique values.
> > The user has (1,1,2,1) values for roles (so, 1 out of the 2, for the
> first
> > role, 1 out of 2 for the second, 2 out of the 1000 for the third value,
> and
> > 1 / 100 for the fourth).
> >
> > I got a somewhat unexpected performance difference. At first, I
> implemented
> > the restriction query like this:
> >
> > for (final String role : roles) {
> >     restrictionQuery.add(new TermQuery(new Term("roles", new
> > BytesRef(role))), Occur.SHOULD);
> > }
> >
> > I then switched to a TermInSetQuery, which I thought would be faster
> > as it is using constant-scores.
> >
> > final Set<BytesRef> rolesSet =
> > roles.stream().map(BytesRef::new).collect(Collectors.toSet());
> > restrictionQuery.add(new TermInSetQuery("roles", rolesSet),
> Occur.SHOULD);
> >
> >
> > However, the TermInSetQuery has about 25% slower ops/s. Is that to
> > be expected? I did not, as I thought the constant-scoring would be
> faster.
> >
>
>
> --
> Adrien
>

Re: unexpected performance TermsQuery Occur.SHOULD vs TermsInSetQuery?

Reply via email to