wolffcm commented on issue #7077:
URL: 
https://github.com/apache/arrow-datafusion/issues/7077#issuecomment-1650199877

   @mustafasrepo Your PR will be a very nice improvement. 
   
   As I understand it, your PR looks for opportunities to remove explicit sorts 
when it's possible to preserve sort order by transforming `RepartitionExec -> 
SortPreservingRepartitionExec` or `CoalescePartitionsExec -> 
SortPreservingMergeExec`.
   
   I think what I want to do is a similar generalization but for the 
`pushdown_sorts` pass of `EnforceSorting`. So I don't think what I want to do 
conflicts with your open PR.
   
   As background, currently the pass `pushdown_sorts` works very nicely when a 
sort sits directly above a union:
   ```
   DeduplicateExec: [...]
     SortPreservingMergeExec: [...]
       SortExec: expr=[...]
         UnionExec
           RecordBatchesExec: batches_groups=1 batches=1 total_rows=1
           ParquetExec: file_groups={...}
   ```
   becomes
   ```
   DeduplicateExec: [...]
     SortPreservingMergeExec: [...]
       UnionExec
         SortExec: expr=[...]
           RecordBatchesExec: batches_groups=1 batches=1 total_rows=1
         ParquetExec: file_groups={...}
   ```
   -----
   What I want to do is extend `pushdown_sorts` to do something like this:
   ```
   DeduplicateExec: [...]
     SortExec: expr=[...]
       RepartitionExec: partitioning=Hash(...), input_partitions=12
         UnionExec
           RecordBatchesExec: batches_groups=1 batches=1 total_rows=1
           ParquetExec: file_groups={...}
   ```
   Should become
   ```
   DeduplicateExec: [...]
     SortPreservingRepartitionExec: partitioning=Hash(...), input_partitions=12
       UnionExec
         SortExec: expr=[...]
           RecordBatchesExec: batches_groups=1 batches=1 total_rows=1
         ParquetExec: file_groups={...}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to