Re: [PR] Add reproducer for consecutive RepartitionExec [datafusion]

via GitHub Wed, 29 Oct 2025 04:32:10 -0700


gene-bordegaray commented on PR #18343:
URL: https://github.com/apache/datafusion/pull/18343#issuecomment-3461048191


   This makes sense. Just want to ask clarifying question to make sure I am 
understanding: 
   - the Round Robin Repartition into the Aggregate is useful because we can 
disperse work across partitions and then accumulate their results. Using the 
aggregated results we can use the Hash Repartition to hand off work with the 
same key (such as env = 'prod') to workers, thus is more efficient
   - the parquet query is not working this way as the Reparititons are not 
separated by the Aggregate. The Aggregate does all this work on a single 
partition then does Repartitioning too late.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add reproducer for consecutive RepartitionExec [datafusion]

Reply via email to