andygrove commented on issue #5076:
URL: 
https://github.com/apache/arrow-datafusion/issues/5076#issuecomment-1523554165

   I'm possibly too influenced by Spark's approach, but option 1 seems to work 
well. For example, Ballista has a `ShuffleWriterExec,` which is an execution 
plan that executes its child query and repartitions the output and writes it to 
disk. `ShuffleWriterExec` then produces its own output, which is metadata about 
the data that was written out.
   
   
https://github.com/apache/arrow-ballista/blob/main/ballista/core/src/execution_plans/shuffle_writer.rs#L329
   
   That said, I have stronger opinions about including the logical plan 
representation than I do about how this is implemented in the physical plan.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to