In MultiTermConstantScoreWrapper there's this block around line 174 in 6x:

do {
  docs = termsEnum.postings(docs, PostingsEnum.NONE);
} while ( != null);

In the case of lots and lots of terms in a multiValued field this can
take quite a bit of time. In my test case I have 100K docs with 200M
terms (pathological I understand, but it illustrates the issue). If
I'm reading this right it loops through all the terms and, for each
term, creates a sub-list of docs for the term and adds the sub-list to
the "master list". So a query like 'field:*' takes 20+ seconds.

Is there anything we can/should do to short circuit this kind of
thing? In this case I got 200M terms by ngramming 3-32 (again, far too
many ngrams I understand). It's not clear to me whether it's an easy
check to say "stop when all the docs have been added to the master

I can raise a JIRA if it makes sense.

For supporting this particular use-case, we could index a separate
field "has_field1_value" but the general case still holds.


To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to