zhuqi-lucas commented on PR #19446:
URL: https://github.com/apache/datafusion/pull/19446#issuecomment-3691555775

   > > > I'm comfortable with `ProjectionExec` and `CooperativeExec` but I 
think `FilterExec` needs some more discussion since pushing ordering below 
filters might have a negative impact on performance for some queries / file 
sources.
   > > 
   > > 
   > > @adriangb Happy holidays! Thank you for the feedback! I completely 
understand your concern about FilterExec.
   > > If i makes sense right, i've updated the implementation to address the 
performance issue you mentioned. Now `FilterExec::try_pushdown_sort` only 
pushes down the sort when the child node already has an output ordering:
   > > ```rust
   > > fn try_pushdown_sort(
   > >         &self,
   > >         order: &[PhysicalSortExpr],
   > >     ) -> Result<SortOrderPushdownResult<Arc<dyn ExecutionPlan>>> {
   > >         if let Some(_child_ordering) = self.input.output_ordering() {
   > >             match self.input.try_pushdown_sort(order)? {
   > >                 SortOrderPushdownResult::Exact { inner } => {
   > >                     let new_exec = 
Arc::new(self.clone()).with_new_children(vec![inner])?;
   > >                     Ok(SortOrderPushdownResult::Exact { inner: new_exec 
})
   > >                 }
   > >                 SortOrderPushdownResult::Inexact { inner } => {
   > >                     let new_exec = 
Arc::new(self.clone()).with_new_children(vec![inner])?;
   > >                     Ok(SortOrderPushdownResult::Inexact { inner: 
new_exec })
   > >                 }
   > >                 SortOrderPushdownResult::Unsupported => {
   > >                     Ok(SortOrderPushdownResult::Unsupported)
   > >                 }
   > >             }
   > >         } else {
   > >             Ok(SortOrderPushdownResult::Unsupported)
   > >         }
   > >     }
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > This way:
   > > ```
   > > * When the data source can optimize to use pushdown sort, so we pushdown 
to leverage the existing ordering.
   > > 
   > > * When the data source is unsorted, we don't pushdown, so sorting 
happens after filtering on the smaller filtered dataset
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > This should avoid the performance degradation you were concerned about 
while still enabling the optimization for sorted data sources. The new tests 
validate this with the `WHERE timeframe = 'quarterly'` cases, which require 
pushdown through FilterExec to utilize the sorted Parquet file's ordering.
   > > What do you think about this approach?
   > 
   > That's a super interesting and smart approach! It absolutely makes sense 
and nukes the problem for cases where the sort orders match (i.e. you basically 
know the sort will be eliminated). But I feel for those cases `EnforceSorting` 
optimizer should already handle. For other cases it's still not clear if it 
always makes sense. Consider a DataSource that does accept sort pushdown (at a 
cost e.g. buffering in memory) but doesn't accept filter pushdown. How do we 
know when it makes sense to push down the sort?
   > 
   > This issue of not pushing things down past filters seems to have come up 
for both projection and sort pushdown. In both cases we are in a situation of 
"always push it down into the DataSource if possible, otherwise it depends on 
costs and we don't have a cost based evaluator so hard to say". I do think it's 
worth asking ourselves the question: why are there some `FilterExec`s left? 
Could we re-order the optimizer rules so that the filters always get pushed 
down first? Do we need to implement filter pushdown for more operators? Maybe 
you could share some of the scenarios / queries where the `FilterExec` is 
preventing sort pushdown?
   > 
   > I still think our best bet would be to remove that once piece from this PR 
so that we can merge the rest then we can continue to discuss the `FilterExec` 
case.
   
   Thank you @adriangb , i agree, and the i am also confused why the filterexec 
still be there for the following new slt testing:
   ```rust
   # Test 2.1: Query with constant prefix filter and DESC on remaining column
   # WHERE timeframe='quarterly' makes the first sort column constant
   # ORDER BY period_end DESC should trigger reverse scan because:
   # File ordering: [timeframe ASC, period_end ASC]
   # After filtering timeframe='quarterly': effectively [period_end ASC]
   # Request: [period_end DESC] -> exact reverse!
   query TT
   EXPLAIN SELECT * FROM timeseries_parquet
   WHERE timeframe = 'quarterly'
   ORDER BY period_end DESC
   LIMIT 2;
   ----
   logical_plan
   01)Sort: timeseries_parquet.period_end DESC NULLS FIRST, fetch=2
   02)--Filter: timeseries_parquet.timeframe = Utf8View("quarterly")
   03)----TableScan: timeseries_parquet projection=[timeframe, period_end, 
value], partial_filters=[timeseries_parquet.timeframe = Utf8View("quarterly")]
   physical_plan
   01)SortPreservingMergeExec: [period_end@1 DESC], fetch=2
   02)--SortExec: TopK(fetch=2), expr=[period_end@1 DESC], 
preserve_partitioning=[true]
   03)----FilterExec: timeframe@0 = quarterly
   04)------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
   05)--------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/sort_pushdown/timeseries_sorted.parquet]]},
 projection=[timeframe, period_end, value], file_type=parquet, 
predicate=timeframe@0 = quarterly AND DynamicFilter [ empty ], 
reverse_row_groups=true, pruning_predicate=timeframe_null_count@2 != 
row_count@3 AND timeframe_min@0 <= quarterly AND quarterly <= timeframe_max@1, 
required_guarantees=[timeframe in (quarterly)]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to