Rachelint commented on code in PR #11943:
URL: https://github.com/apache/datafusion/pull/11943#discussion_r1720011628


##########
datafusion/expr-common/src/groups_accumulator.rs:
##########
@@ -31,6 +31,13 @@ pub enum EmitTo {
     /// For example, if `n=10`, group_index `0, 1, ... 9` are emitted
     /// and group indexes '`10, 11, 12, ...` become `0, 1, 2, ...`.
     First(usize),
+    /// Emit all groups managed by blocks
+    AllBlocks,
+    /// Emit only the first `n` group blocks,
+    /// similar as `First`, but used in blocked `GroupValues` and 
`GroupAccumulator`.
+    ///
+    /// For example, `n=3`, `block size=4`, finally 12 groups will be returned.
+    FirstBlocks(usize),

Review Comment:
   > > use the iterator approach to impl AllBlocks and FirstBlocks defined now
   > 
   > Not sure how does this work, but it looks like a neat idea. If we apply 
the same idea to "element" (First and All), and consider it as a specialized 
case with block_size = 1, I think we could end up a pretty nice abstraction. 
Probably we just need `EmitTo::Block(block_size)` 🤔 However, it is too far way 
from now. 😆
   
   🤔 Yes, other emit mode can indeed seen as a case with specialized blocke 
size in the iterator approach. But considered about performance, it is better 
to let `batch_size == block_size`.
   
   After introduce the iterator approach, just 200+ codes to finished the  
sketch, compared to the stale version sketch with 600+. 
   The main work is just to add a stream state 
`ExecutionState::ProducingBlocks(blocks)` .
   
https://github.com/Rachelint/arrow-datafusion/blob/d79d912d1677549c825cafc405911973ace0df46/datafusion/physical-plan/src/aggregates/row_hash.rs#L728
   
   Maybe it can show how the blocked optimzation works.
   
   
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to