Hi Greg, To better understand how much work gets duplicated, I went ahead and modified FloatTaxonomyFacets as an example [1]. It doesn't look too pretty, but it illustrates how I think multiple aggregations in one iteration could work.
Overall, you're right, there's not as much wasted work as I had expected. I'll try to do a performance comparison to quantify precisely how much time we could save, just in case. Thank you the help! Stefan [1] https://github.com/stefanvodita/lucene/commit/3227dabe746858fc81b9f6e4d2ac9b66e8c32684 On Wed, 15 Feb 2023 at 15:48, Greg Miller <gsmil...@gmail.com> wrote: > > Hi Stefan- > > > In that case, iterating twice duplicates most of the work, correct? > > I'm not sure I'd agree that it duplicates "most" of the work. This is an > association faceting example, which is a little bit of a special case in > some ways. But, to your question, there is duplicated work here of > re-loading the ordinals across the two aggregations, but I would suspect > the more expensive work is actually computing the different aggregations, > which is not duplicated. You're right that it would likely be more > efficient to iterate the hits once, loading the ordinals once and computing > multiple aggregations in one pass. There's no facility for doing that > currently in Lucene's faceting module, but you could always propose it! :) > That said, I'm not sure how common of a case this really is for the > majority of users? But that's just a guess/assumption. > > Cheers, > -Greg > > On Tue, Feb 14, 2023 at 3:19 AM Stefan Vodita <stefan.vod...@gmail.com> > wrote: > > > Hi Greg, > > > > I see now where my example didn’t give enough info. In my mind, `Genre / > > Author nationality / Author name` is stored in one hierarchical facet > > field. > > The data we’re aggregating over, like publish date or price, are stored in > > DocValues. > > > > The demo package shows something similar [1], where the aggregation > > is computed across a facet field using data from a `popularity` DocValue. > > > > In the demo, we compute `sum(_score * sqrt(popularity))`, but what if we > > want several other different aggregations with respect to the same facet > > field? Maybe we want `max(popularity)`. In that case, iterating twice > > duplicates most of the work, correct? > > > > > > Stefan > > > > [1] > > https://github.com/apache/lucene/blob/7f8b7ffbcad2265b047a5e2195f76cc924028063/lucene/demo/src/java/org/apache/lucene/demo/facet/ExpressionAggregationFacetsExample.java#L91 > > > > On Mon, 13 Feb 2023 at 22:46, Greg Miller <gsmil...@gmail.com> wrote: > > > > > > Hi Stefan- > > > > > > That helps, thanks. I'm a bit confused about where you're concerned with > > > iterating over the match set multiple times. Is this a situation where > > the > > > ordinals you want to facet over are stored in different index fields, so > > > you have to create multiple Facets instances (one per field) to compute > > the > > > aggregations? If that's the case, then yes—you have to iterate over the > > > match set multiple times (once per field). I'm not sure that's such a big > > > issue given that you're doing novel work during each iteration, so the > > only > > > repetitive cost is actually iterating the hits. If the ordinals are > > > "packed" into the same field though (which is the default in Lucene if > > > you're using taxonomy faceting), then you should only need to do a single > > > iteration over that field. > > > > > > Cheers, > > > -Greg > > > > > > On Sat, Feb 11, 2023 at 2:27 AM Stefan Vodita <stefan.vod...@gmail.com> > > > wrote: > > > > > > > Hi Greg, > > > > > > > > I’m assuming we have one match-set which was not constrained by any > > > > of the categories we want to aggregate over, so it may have books by > > > > Mark Twain, books by American authors, and sci-fi books. > > > > > > > > Maybe we can imagine we obtained it by searching for a keyword, say > > > > “Washington”, which is present in Mark Twain’s writing, and those of > > other > > > > American authors, and in sci-fi novels too. > > > > > > > > Does that make the example clearer? > > > > > > > > > > > > Stefan > > > > > > > > > > > > On Sat, 11 Feb 2023 at 00:16, Greg Miller <gsmil...@gmail.com> wrote: > > > > > > > > > > Hi Stefan- > > > > > > > > > > Can you clarify your example a little bit? It sounds like you want to > > > > facet > > > > > over three different match sets (one constrained by "Mark Twain" as > > the > > > > > author, one constrained by "American authors" and one constrained by > > the > > > > > "sci-fi" genre). Is that correct? > > > > > > > > > > Cheers, > > > > > -Greg > > > > > > > > > > On Fri, Feb 10, 2023 at 11:33 AM Stefan Vodita < > > stefan.vod...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > Let’s say I have an index of books, similar to the example in the > > facet > > > > > > demo [1] > > > > > > with a hierarchical facet field encapsulating `Genre / Author’s > > > > > > nationality / > > > > > > Author’s name`. > > > > > > > > > > > > I might like to find the latest publish date of a book written by > > Mark > > > > > > Twain, the > > > > > > sum of the prices of books written by American authors, and the > > number > > > > of > > > > > > sci-fi novels. > > > > > > > > > > > > As far as I understand, this would require faceting 3 times over > > the > > > > > > match-set, > > > > > > one iteration for each aggregation of a different type (max(date), > > > > > > sum(price), > > > > > > count). That seems inefficient if we could instead compute all > > > > > > aggregations in > > > > > > one pass. > > > > > > > > > > > > Is there a way to do that? > > > > > > > > > > > > > > > > > > Stefan > > > > > > > > > > > > [1] > > > > > > > > > > > > https://javadoc.io/doc/org.apache.lucene/lucene-demo/latest/org/apache/lucene/demo/facet/package-summary.html > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org