Re: [PR] refactor: Generate GroupByHash output in multiple RecordBatches [arrow-datafusion]

via GitHub Mon, 15 Apr 2024 03:55:48 -0700


alamb commented on code in PR #9818:
URL: https://github.com/apache/arrow-datafusion/pull/9818#discussion_r1565589049



##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -787,7 +789,8 @@ impl GroupedHashAggregateStream {
         let timer = elapsed_compute.timer();
         self.exec_state = if self.spill_state.spills.is_empty() {
             let batch = self.emit(EmitTo::All, false)?;
-            ExecutionState::ProducingOutput(batch)
+            let batches = self.split_batch(batch)?;

Review Comment:
   I thinik the key point of the request is to avoid the call to 
`emit(EmitTo::All)` or perhaps change that call to return a Vec<RecordBatch>
   
   Taking a large single record batch and slicing it up doesn't change how the 
underlying memory is allocated / laid out (aka the same large contiguous batch 
is used)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] refactor: Generate GroupByHash output in multiple RecordBatches [arrow-datafusion]

Reply via email to