alamb commented on PR #6009:
URL:
https://github.com/apache/arrow-datafusion/pull/6009#issuecomment-1509099461
I spent a non trivial time thinking about what a "PartitionAware" Union even
means
It is entirely undocumented in
https://docs.rs/datafusion/22.0.0/datafusion/physical_plan/union/struct.UnionExec.html
Thanks to @mingmwang and @crepererum 's comment on
https://github.com/apache/arrow-datafusion/issues/5970#issuecomment-1508165252
Given the operation seems so different, if we are going to keep the
partition aware union, I think we should use a different structure name. Maybe
we could call what is currently named "UnionExec with preserve partitioning"
as `Interleave` -- that would imply the data from the different partitions
was kept segregated in their own partitions but interleaved in the output
partition streams.
> The physical plan should show or display the UnionExec is partition-aware
or ordering-aware clearly.
I will try and make a PR to do this to make the current state of affairs
easier to understand
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]