andygrove commented on issue #5076: URL: https://github.com/apache/arrow-datafusion/issues/5076#issuecomment-1523554165
I'm possibly too influenced by Spark's approach, but option 1 seems to work well. For example, Ballista has a `ShuffleWriterExec,` which is an execution plan that executes its child query and repartitions the output and writes it to disk. `ShuffleWriterExec` then produces its own output, which is metadata about the data that was written out. https://github.com/apache/arrow-ballista/blob/main/ballista/core/src/execution_plans/shuffle_writer.rs#L329 That said, I have stronger opinions about including the logical plan representation than I do about how this is implemented in the physical plan. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
