gianm commented on issue #12022: URL: https://github.com/apache/druid/issues/12022#issuecomment-986076579
To me this feels like a change that likely doesn't require incompatibility in order to perform well, for a couple specific reasons (below). So I do think we should do benchmarking before deciding to make things incompatible. > doing this in the proposed backwards compatible way adds a branch after every single call to the aggregator which, given that this is a hot method, could have significant inadvertent impacts to query performance I was thinking that query-time callers would stick with `void aggregate`, so no branching would be needed on those paths. Currently the users of Aggregator are timeseries and topN, which have a natural limitation on the number of Aggregators they create anyway: timeseries only makes one set per time bucket and topN has its threshold. In practice I haven't noticed any issues with these query types running out of memory, so I don't think there is a reason to add the overhead of size estimation. > My only worry with that is that the most common implementation of Aggregator.aggregate would likely be via that default implementation and I'm not sure if this would be adding yet another virtual function call to the hot path. This one seems like the kind of thing that the JVM should be able to optimize. Also, for the reason above, it's a code path that I'd only expect to get called during ingestion anyway, where aggregation time is not as big a driver of overall performance as it is during queries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
