Thanks Mike. I'm not sure this _should_ be fixed mind you, but thought I'd ask.
On Thu, Sep 22, 2016 at 10:16 AM, Michael McCandless <luc...@mikemccandless.com> wrote: > You could index the prefix terms (edge ngrams), assuming your queries > are prefix queries; this way there would typically be far fewer terms > to visit than all 200 M terms. > > Auto-prefix terms also tried to solves this more "automatically", so > you don't have to mess with edge ngrams, but we reverted it because of > the added code complexity and lack of real-word use cases especially > once we switched numerics from postings to dimensional points > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, Sep 22, 2016 at 1:01 PM, Erick Erickson <erickerick...@gmail.com> > wrote: >> In MultiTermConstantScoreWrapper there's this block around line 174 in 6x: >> >> do { >> docs = termsEnum.postings(docs, PostingsEnum.NONE); >> builder.add(docs); >> } while (termsEnum.next() != null); >> >> In the case of lots and lots of terms in a multiValued field this can >> take quite a bit of time. In my test case I have 100K docs with 200M >> terms (pathological I understand, but it illustrates the issue). If >> I'm reading this right it loops through all the terms and, for each >> term, creates a sub-list of docs for the term and adds the sub-list to >> the "master list". So a query like 'field:*' takes 20+ seconds. >> >> Is there anything we can/should do to short circuit this kind of >> thing? In this case I got 200M terms by ngramming 3-32 (again, far too >> many ngrams I understand). It's not clear to me whether it's an easy >> check to say "stop when all the docs have been added to the master >> list".... >> >> I can raise a JIRA if it makes sense. >> >> For supporting this particular use-case, we could index a separate >> field "has_field1_value" but the general case still holds. >> >> Erick >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org