Hi Lucene users, I am developing a search application that needs to do some basic summary statistics. We use Lucene 8.9.0. To improve performance for e.g. summing a value across 10,000 documents, we are using DocValues as columnar storage.
In order to retrieve the DocValues without collecting all hits into a TopDocs, which we determined to cause a lot of memory pressure and consume much time, we are using the expert Collector query interface. Here's the code, simplified a bit for the list: final collector = new Collector() { long sum = 0; @Override public ScoreMode scoreMode() { return ScoreMode.COMPLETE_NO_SCORES; } @Override public LeafCollector getLeafCollector(final LeafReaderContext context) throws IOException { if (context.docBase == 0) { sum = 0; // XXX: this should not be necessary? } final var subtotalValue = context.reader().getNumericDocValues("subtotal"); return new LeafCollector() { @Override public void setScorer(final Scorable scorer) throws IOException { } @Override public void collect(final int doc) throws IOException { if (subtotalValue.docID() > doc || !subtotalValue.advanceExact(doc) || subtotalValue.longValue() == 0) { return; } sum += subtotalValue.longValue(); } }; } } searcher.search(myQuery, collector); return collector.sum; The query is a moderately complicated Boolean query with some TermQuery and MultiTermQuery instances combined together. While first testing, I observed that seemingly the collector is called twice for each document, and the sum is exactly double what you would expect. It seems that the Collector is observing every matched document twice, and by printing out the Scorer, I see that it's done with two different BooleanScorer instances. You can see my hack that resets the collector every time it starts at docBase 0. which I am sure is not the right approach, but seems to work. What is the right pattern to ensure my Collector only observes result documents once, no matter the input query? I see a note in the documentation that state is supposed to be stored on the Scorer implementation, but I am not providing a custom Scorer, nor do I actually want any scoring at all. Thank you for any guidance! Steven --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org