mingmwang commented on issue #4973:
URL: 
https://github.com/apache/arrow-datafusion/issues/4973#issuecomment-1608796360

   @alamb @tustvold 
   
   If I understand correctly, in this new proposal, we will not use the `Row 
Layout` to manage the `Accumulator` states, right?
   I think the `Row Layout` is still useful in the case of high cardinality 
grouping along with multiple `Accumulators` (avg, min, max,..).   With the `Row 
Layout`, the states of multiple `Accumulators` belong to the same group can be 
updated in the tight loop. If we let each `Accumulator` manages its own state, 
I guess there will be more random memory access.
   
   And below is my observation:
   
   1).  The per-group type dispatching overhead is not the major bottleneck, 
that's also why the PR #6657 only get 10%~ 15%
   performance gain. 
   2).  The random memory access and mem stall should be the major bottleneck 
in the case of high cardinality grouping .
   
   And I checked DuckDB's source code, looks like they also use the  `Row 
Layout` to manage the aggregator states.
   They use C++ templates in the state update functions also. I'm going to take 
a closer look at the source code this week and next week.
   
   
https://github.com/duckdb/duckdb/blob/e49cf8132dc07324e0bb165593c8160837fcef3a/src/include/duckdb/common/types/row/row_layout.hpp
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to