Hey Shradha, Such a contribution would be welcome. There is no good reason not to support richer aggregations in Lucene. One thing that I have found interesting with faceting/aggregations is that every implementation seems to make different trade-offs, e.g. - Lucene's faceting historically required adding side-car data, but we seem to want to make it work more and more with regular doc values instead of the side-car index? - Both Lucene's faceting module and Solr (I think) load the set of matches into a bitset first, and then compute facets against this bitset while Elasticsearch computes aggregations within the collector. - Both Elasticsearch and Solr have composable aggregations, e.g. break down by category, and then within each category by brand, but Lucene's facets don't support this.
If you're going to build a new one, I have some suggestions: - Let's avoid dependencies on side-car indexes? - I don't think we should load matches into an int[] or BitSet. It takes too much memory. However it's also true that collecting docs one-by-one makes some things slower. Maybe we should look into doing something in-between like batching computation of aggregations? This could still allow taking advantage of e.g. vectorization if computing, say, the average of a field. On Fri, Jun 16, 2023 at 4:14 PM Shradha Shankar <shradha.shan...@gmail.com> wrote: > Hi Lucene devs, > > I work on product search at Amazon, where we use Lucene faceting > to compute aggregations. There's a few functionalities I'm missing with > faceting. For example, faceting will always aggregate all the way up to the > dimension and it can't compute multiple aggregations in one pass of the > match-set. > > Lucene-based search engines (like Elastic or OpenSearch) have feature-rich > aggregation engines which allow different collection modes and give the > user > more control over the granularity of the scopes for which aggregations are > computed. > > Are there historical reasons not to have this type of aggregation engine > directly in Lucene? If it seems like a worthwhile idea to pursue, I've > experimented a bit with how we could fulfill these needs in Lucene and I > can > open an issue/PR. > > Thanks, > Shradha > -- Adrien