Great optimization! I'm dubious on it being a good contribution to Lucene itself however, because what you propose fits cleanly above Lucene. Even at a ES/Solr layer (which I know you don't use, but hypothetically speaking), I'm dubious there as well.
~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Dec 14, 2020 at 2:37 PM Michael Froh <[email protected]> wrote: > My team at work has a neat feature that we've built on top of Lucene that > has provided a substantial (20%+) increase in maximum qps and some > reduction in query latency. > > Basically, we run a training process that looks at historical queries to > find frequently co-occurring combinations of required clauses, say "+A +B > +C +D". Then at indexing time, if a document satisfies one of these known > combinations, we add a new term to the doc, like "opto:ABCD". At query > time, we can then replace the required clauses with a single TermQuery for > the "optimized" term. > > It adds a little bit of extra work at indexing time and requires the > offline training step, but we've found that it yields a significant boost > at query time. > > We're interested in open-sourcing this feature. Is it something worth > adding to Lucene? Since it doesn't require any core changes, maybe as a > module? >
