In MultiTermConstantScoreWrapper there's this block around line 174 in 6x: do { docs = termsEnum.postings(docs, PostingsEnum.NONE); builder.add(docs); } while (termsEnum.next() != null);
In the case of lots and lots of terms in a multiValued field this can take quite a bit of time. In my test case I have 100K docs with 200M terms (pathological I understand, but it illustrates the issue). If I'm reading this right it loops through all the terms and, for each term, creates a sub-list of docs for the term and adds the sub-list to the "master list". So a query like 'field:*' takes 20+ seconds. Is there anything we can/should do to short circuit this kind of thing? In this case I got 200M terms by ngramming 3-32 (again, far too many ngrams I understand). It's not clear to me whether it's an easy check to say "stop when all the docs have been added to the master list".... I can raise a JIRA if it makes sense. For supporting this particular use-case, we could index a separate field "has_field1_value" but the general case still holds. Erick --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org