NGA-TRAN opened a new issue, #9898:
URL: https://github.com/apache/arrow-datafusion/issues/9898

   ### Is your feature request related to a problem or challenge?
   
   In InfluxDB IOx, we want to improve this query plan:
   
   ```
   Sort (col1, col2, col3, ... coln)
      Union
          Projection (col1, col2,  col3, ... coln)
          Projection (col1, col2,  col3, ... coln)
          Projection (col1, col2,  col3, ... coln)
   ```
   
   Since `col1` is the same for each projection, the sort can be pushed down 
below the Union and the new plan will be like this:
   
   ```
   ProgressiveEval                            <-- new operator (available in 
InfluxDB IOx that output data in their order)
      Union
            Sort(col2, col3, ... coln)     <-- sort is pushed down but only 
from col2 to coln
                 Projection (col1, col2,  col3, ... coln)
           Sort(col2, col3, ... coln)
                Projection (col1, col2,  col3, ... coln)
           .....
                .....
           Sort(col2, col3, ... coln)
                Projection (col1, col2,  col3, ... coln)
   ```
   
   There are now many sorts but each only work on a subset of data in parallel. 
Also, the `ProgressiveEval` ensure the sort streams is ordered by `col1` which 
is very cheap.
   
   The above plan would work for us, however, we hit an issue in DataFusion 
that the `Sorts`  under `Union` are always removed from the plan at the the 
`enforce_sorting` step  
https://github.com/apache/arrow-datafusion/blob/09f5a544d25f36ff1d65cc377123aee9b0e8f538/datafusion/core/src/physical_optimizer/enforce_sorting.rs#L361.
   
   After some investigation, we found the reason the sorts under union  are 
always removed  because the Union does not implement `required_input_ordering`. 
It uses the default 
https://github.com/apache/arrow-datafusion/blob/179179c0b719a7f9e33d138ab728fdc2b0e1e1d8/datafusion/physical-plan/src/lib.rs#L155
 which is always an array of None
   
   ### Describe the solution you'd like
   
   Implement `required_input_ordering` in UnionExec to have it ask its inputs 
to keep their sort order  if the Union has output_ordering
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to