Dandandan opened a new pull request, #21330:
URL: https://github.com/apache/datafusion/pull/21330

   ## Which issue does this PR close?
   
   <!--
   Related to the same approach as the SortPreservingMergeExec lazy spawning 
work.
   -->
   
   ## Rationale for this change
   
   Currently, `RepartitionExec::execute()` eagerly calls 
`ensure_input_streams_initialized()` which opens all input streams immediately, 
even before any output partition is polled. This means that when only one 
output partition is needed (or when the consumer isn't ready yet), all input 
partitions are already executing and buffering data.
   
   For example, in a `HashJoinExec` plan, both the build and probe side 
`RepartitionExec` nodes start pulling data eagerly, even though the probe side 
isn't consumed until the build side completes — wasting memory and I/O.
   
   ## What changes are included in this PR?
   
   Removes the eager `ensure_input_streams_initialized` call from `execute()`. 
The `consume_input_streams` method (called on first poll via 
`futures::stream::once`) already handles the `NotInitialized` state, so this 
was purely redundant eager work.
   
   Updated `error_for_input_exec` test to expect the error on stream poll 
rather than on `execute()`, since initialization is now deferred.
   
   ## Are these changes tested?
   
   Covered by existing repartition tests (33 pass). One test updated to match 
new lazy behavior.
   
   ## Are there any user-facing changes?
   
   No API changes. Errors from input `execute()` calls are now surfaced on 
stream poll rather than on `execute()`, which is consistent with how other 
deferred errors work in DataFusion.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to