alamb opened a new issue, #4943:
URL: https://github.com/apache/arrow-datafusion/issues/4943

   **Describe the bug**
   
   Given the following input plan (I see this by enabling trace logging via 
`RUST_LOG=trace`:
   
   ```text
   SortExec: [tag@2 ASC NULLS LAST]
     ProjectionExec: expr=[bar@0 as bar, foo@1 as foo, tag@2 as tag, time@3 as 
time]
       DeduplicateExec: [tag@2 ASC,time@3 ASC]
         SortPreservingMergeExec: [tag@2 ASC,time@3 ASC]
           UnionExec
             ParquetExec: limit=None, partitions={1 group: [[d.parquet]]}, 
output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag, time]
             SortExec: [tag@2 ASC,time@3 ASC]
               RecordBatchesExec: batches_groups=1 batches=1
   ```
   
   Here is the input to enforce sorting:
   
   ```text
   Optimized physical plan by EnforceDistribution:
   SortExec: [tag@2 ASC NULLS LAST]
     CoalescePartitionsExec
       ProjectionExec: expr=[bar@0 as bar, foo@1 as foo, tag@2 as tag, time@3 
as time]
         RepartitionExec: partitioning=RoundRobinBatch(4)
           DeduplicateExec: [tag@2 ASC,time@3 ASC]
             SortPreservingMergeExec: [tag@2 ASC,time@3 ASC]
               UnionExec                                 <-- ** Note that the 
ParquetExec is already sorted correctly!
                 ParquetExec: limit=None, partitions={1 group: [[d.parquet]]}, 
output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag, time]
                 SortExec: [tag@2 ASC,time@3 ASC]
                   RecordBatchesExec: batches_groups=1 batches=1
   ```
   
   And here is the output from  `EnforceSorting`, where it has moved the 
SortExec up to the top of the union:
   
   ```text
   Optimized physical plan by EnforceSorting:
   SortExec: [tag@2 ASC NULLS LAST]
     CoalescePartitionsExec
       ProjectionExec: expr=[bar@0 as bar, foo@1 as foo, tag@2 as tag, time@3 
as time]
         RepartitionExec: partitioning=RoundRobinBatch(4)
           DeduplicateExec: [tag@2 ASC,time@3 ASC]
             SortPreservingMergeExec: [tag@2 ASC,time@3 ASC]
               SortExec: [tag@2 ASC,time@3 ASC]        <-- ** SortExec is moved 
to the output of Union, *resorting* the parquet file
                 UnionExec
                   ParquetExec: limit=None, partitions={1 group: 
[[1/1/1/1/57d6a92a-314a-4a32-a633-33bc3e1fe7a3.parquet]]}, 
output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag, time]
                   RecordBatchesExec: batches_groups=1 batches=1
   ```
   
   **To Reproduce**
   I have a reproducer from IOx -- see 
https://github.com/influxdata/influxdb_iox/pull/6528#discussion_r1070632410
   
   **Expected behavior**
   I expect the `SortExec` to be left where it is (at the input to the 
   
   **Additional context**
   I found this in the context of upgrading DataFusion in IOx: 
https://github.com/influxdata/influxdb_iox/pull/6528


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to