In MultiTermConstantScoreWrapper there's this block around line 174 in 6x:

do {
  docs = termsEnum.postings(docs, PostingsEnum.NONE);
  builder.add(docs);
} while (termsEnum.next() != null);

In the case of lots and lots of terms in a multiValued field this can
take quite a bit of time. In my test case I have 100K docs with 200M
terms (pathological I understand, but it illustrates the issue). If
I'm reading this right it loops through all the terms and, for each
term, creates a sub-list of docs for the term and adds the sub-list to
the "master list". So a query like 'field:*' takes 20+ seconds.

Is there anything we can/should do to short circuit this kind of
thing? In this case I got 200M terms by ngramming 3-32 (again, far too
many ngrams I understand). It's not clear to me whether it's an easy
check to say "stop when all the docs have been added to the master
list"....

I can raise a JIRA if it makes sense.

For supporting this particular use-case, we could index a separate
field "has_field1_value" but the general case still holds.

Erick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to