alamb opened a new issue, #8120: URL: https://github.com/apache/arrow-datafusion/issues/8120
### Is your feature request related to a problem or challenge? While upgrading IOx, to use https://github.com/apache/arrow-datafusion/pull/8006, I found that the `EnforceSorting` pass was adding a sort node to an `ExecutionPlan`, even when the input to that plan was sorted correctly and `ExecutionPlan::maintains_input_order` returns true, where prior to https://github.com/apache/arrow-datafusion/pull/8006 it did not. The issue is that our node did not *also* correctly report its `EquivalenceProperties` which now the EnforceSorting pass relies on. Thus, I think we are in the situation where * the combination of `output_ordering`, and `equivalence_properties` is used by EnforceSorting * [`maintains_input_order`](https://github.com/search?q=repo%3Aapache%2Farrow-datafusion+maintains_input_order&type=code&p=2) is used in other places This leads to the situation where implementers of ExecutionPlan have to keep the three methods in sync, which I think this is confusing and error prone. ### Describe the solution you'd like I propose we deprecate `maintains_input_order` (with a hint about equivalence classes to help other users) and use only `output_ordering` and `equivalence_properties` to determine if the order is maintained ### Describe alternatives you've considered _No response_ ### Additional context # Example ```rust impl ExecutionPlan for DeduplicateExec { ... fn maintains_input_order(&self) -> Vec<bool> { vec![true] } ... } ``` For a plan like this ```text ProjectionExec: expr=[field_int@1 as field_int, tag1@2 as tag1, time@3 as time] DeduplicateExec: [tag1@2 ASC,time@3 ASC] SortExec: expr=[tag1@2 ASC,time@3 ASC,__chunk_order@0 ASC] RecordBatchesExec: chunks=1, projection=[__chunk_order, field_int, tag1, time] ``` ## Prior to https://github.com/apache/arrow-datafusion/pull/8006 No sort is added, ``` 2023-11-10T11:30:52.620358Z TRACE log: Optimized physical plan by EnforceSorting: OutputRequirementExec ProjectionExec: expr=[field_int@1 as field_int, tag1@2 as tag1, time@3 as time] DeduplicateExec: [tag1@2 ASC,time@3 ASC] SortExec: expr=[tag1@2 ASC,time@3 ASC,__chunk_order@0 ASC] RecordBatchesExec: chunks=1, projection=[__chunk_order, field_int, tag1, time] ``` ## After https://github.com/apache/arrow-datafusion/pull/8006 The EnforceSorting rule adds a `SortExec` at the top ```text 2023-11-10T11:29:45.120962Z TRACE log: Optimized physical plan by EnforceSorting: OutputRequirementExec SortExec: expr=[tag1@1 ASC,time@2 ASC] ProjectionExec: expr=[field_int@1 as field_int, tag1@2 as tag1, time@3 as time] DeduplicateExec: [tag1@2 ASC,time@3 ASC] SortExec: expr=[tag1@2 ASC,time@3 ASC,__chunk_order@0 ASC] RecordBatchesExec: chunks=1, projection=[__chunk_order, field_int, tag1, time] ``` ## Adding `equivalence_properties` fixed the problem: ```rust impl ExecutionPlan for DeduplicateExec { ... fn equivalence_properties(&self) -> EquivalenceProperties { // deduplicate does not change the equivalence properties self.input.equivalence_properties() } ... } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
