Hello Adrien, Thanks for the swift reply. I'll add the details:
Lucene version: 8.6.2 The restrictionQuery is indeed a conjunction, it allowes for a document to be a hit if the 'roles' field is empty as well. It's used within a bigger query builder; so maybe I did something else wrong. I'll rewrite the benchmark to just benchmark the TermsInSet and Terms. It never occurred (hah) to me to use Occur.FILTER, that is a good point to check as well. As you put it, I would expect the results to be very similar, as I do not react the 16 terms in the TermInSet. I'll let you know what I'll find. On Tue, Oct 13, 2020 at 11:48 AM Adrien Grand <jpou...@gmail.com> wrote: > Can you give us a few more details: > - What version of Lucene are you testing? > - Are you benchmarking "restrictionQuery" on its own, or its conjunction > with another query? > > You mentioned that you combine your "restrictionQuery" and the user query > with Occur.MUST, Occur.FILTER feels more appropriate for "restrictionQuery" > since it should not contribute to scoring. > > TermsInSetQuery automatically executes like a BooleanQuery when the number > of clauses is less than 16, so I would not expect major performance > differences between a TermInSetQuery over less than 16 terms and a > BooleanQuery wrapped in a ConstantScoreQuery. > > On Tue, Oct 13, 2020 at 11:35 AM Rob Audenaerde <rob.audenae...@gmail.com> > wrote: > > > Hello, > > > > I'm benchmarking an application which implements security on lucene by > > adding a multivalue field "roles". If the user has one of these roles, he > > can find the document. > > > > I implemented this as a Boolean and query, added the original query and > the > > restriction with Occur.MUST. > > > > I'm having some performance issues when counting the index (>60M docs), > so > > I thought about tweaking this restriction-implementation. > > > > I set-up a benchmark like this: > > > > I generate 2M documents, Each document has a multi-value "roles" field. > The > > "roles" field in each document has 4 values, taken from (2,2,1000,100) > > unique values. > > The user has (1,1,2,1) values for roles (so, 1 out of the 2, for the > first > > role, 1 out of 2 for the second, 2 out of the 1000 for the third value, > and > > 1 / 100 for the fourth). > > > > I got a somewhat unexpected performance difference. At first, I > implemented > > the restriction query like this: > > > > for (final String role : roles) { > > restrictionQuery.add(new TermQuery(new Term("roles", new > > BytesRef(role))), Occur.SHOULD); > > } > > > > I then switched to a TermInSetQuery, which I thought would be faster > > as it is using constant-scores. > > > > final Set<BytesRef> rolesSet = > > roles.stream().map(BytesRef::new).collect(Collectors.toSet()); > > restrictionQuery.add(new TermInSetQuery("roles", rolesSet), > Occur.SHOULD); > > > > > > However, the TermInSetQuery has about 25% slower ops/s. Is that to > > be expected? I did not, as I thought the constant-scoring would be > faster. > > > > > -- > Adrien >