alamb opened a new issue, #4943:
URL: https://github.com/apache/arrow-datafusion/issues/4943
**Describe the bug**
Given the following input plan (I see this by enabling trace logging via
`RUST_LOG=trace`:
```text
SortExec: [tag@2 ASC NULLS LAST]
ProjectionExec: expr=[bar@0 as bar, foo@1 as foo, tag@2 as tag, time@3 as
time]
DeduplicateExec: [tag@2 ASC,time@3 ASC]
SortPreservingMergeExec: [tag@2 ASC,time@3 ASC]
UnionExec
ParquetExec: limit=None, partitions={1 group: [[d.parquet]]},
output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag, time]
SortExec: [tag@2 ASC,time@3 ASC]
RecordBatchesExec: batches_groups=1 batches=1
```
Here is the input to enforce sorting:
```text
Optimized physical plan by EnforceDistribution:
SortExec: [tag@2 ASC NULLS LAST]
CoalescePartitionsExec
ProjectionExec: expr=[bar@0 as bar, foo@1 as foo, tag@2 as tag, time@3
as time]
RepartitionExec: partitioning=RoundRobinBatch(4)
DeduplicateExec: [tag@2 ASC,time@3 ASC]
SortPreservingMergeExec: [tag@2 ASC,time@3 ASC]
UnionExec <-- ** Note that the
ParquetExec is already sorted correctly!
ParquetExec: limit=None, partitions={1 group: [[d.parquet]]},
output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag, time]
SortExec: [tag@2 ASC,time@3 ASC]
RecordBatchesExec: batches_groups=1 batches=1
```
And here is the output from `EnforceSorting`, where it has moved the
SortExec up to the top of the union:
```text
Optimized physical plan by EnforceSorting:
SortExec: [tag@2 ASC NULLS LAST]
CoalescePartitionsExec
ProjectionExec: expr=[bar@0 as bar, foo@1 as foo, tag@2 as tag, time@3
as time]
RepartitionExec: partitioning=RoundRobinBatch(4)
DeduplicateExec: [tag@2 ASC,time@3 ASC]
SortPreservingMergeExec: [tag@2 ASC,time@3 ASC]
SortExec: [tag@2 ASC,time@3 ASC] <-- ** SortExec is moved
to the output of Union, *resorting* the parquet file
UnionExec
ParquetExec: limit=None, partitions={1 group:
[[1/1/1/1/57d6a92a-314a-4a32-a633-33bc3e1fe7a3.parquet]]},
output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag, time]
RecordBatchesExec: batches_groups=1 batches=1
```
**To Reproduce**
I have a reproducer from IOx -- see
https://github.com/influxdata/influxdb_iox/pull/6528#discussion_r1070632410
**Expected behavior**
I expect the `SortExec` to be left where it is (at the input to the
**Additional context**
I found this in the context of upgrading DataFusion in IOx:
https://github.com/influxdata/influxdb_iox/pull/6528
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]