leventov opened a new issue #8335: Skip `positions` indirection in 
`PooledTopNAlgorithm` when the aggregation size is small
URL: https://github.com/apache/incubator-druid/issues/8335
 
 
   `int[] positions` indirection in 
[`PooledTopNAlgorithm`](https://github.com/apache/incubator-druid/blob/566dc8c719489283f9190cefd6346bbb3f12955f/processing/src/main/java/org/apache/druid/query/topn/PooledTopNAlgorithm.java)
 seems wasteful, especially when the aggregation size itself is just 4 or 8 
bytes, as is the case of float/double/long Min/Max/Sum aggregations, leading to 
33%/50% higher memory usage than needed for processing. It's role to initialize 
the aggregation at the right moment 1) can be replaced with `BitSet 
dimIndexInitialized`; 2) may be unnecessary/wasteful itself for aggregators 
which zero the memory as their initialization step: it may be faster to just 
stream set the whole buffer's memory to zero at the beginning of processing.
   
   There is a locality concern for larger aggregations: `positions` facilitate 
putting the hottest aggregations together at the beginning of the buffer, thus 
improving the cache and the TLB utilization. This positive effect is completely 
canceled by the `positions` itself (access to which is still random) for 
aggregations of 4 bytes and almost for sure for aggregations of 8 bytes. After 
that, there should be experiments showing at which aggregation size the 
positive effect of `positions` outweigh its negative effect (which also 
diminishes with the growth of the aggregation size): its likely to be somewhere 
between 12 and 32 bytes, but benchmarking is required to determine the 
threshold more precisely.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to