jackwener commented on code in PR #8023:
URL: https://github.com/apache/arrow-datafusion/pull/8023#discussion_r1379545966
##########
datafusion/optimizer/src/push_down_filter.rs:
##########
@@ -33,31 +34,93 @@ use itertools::Itertools;
use std::collections::{HashMap, HashSet};
use std::sync::Arc;
-/// Push Down Filter optimizer rule pushes filter clauses down the plan
+/// Optimizer rule for pushing (moving) filter expressions down in a plan so
+/// they are applied as early as possible.
+///
/// # Introduction
-/// A filter-commutative operation is an operation whose result of
filter(op(data)) = op(filter(data)).
-/// An example of a filter-commutative operation is a projection; a
counter-example is `limit`.
///
-/// The filter-commutative property is column-specific. An aggregate grouped
by A on SUM(B)
-/// can commute with a filter that depends on A only, but does not commute
with a filter that depends
-/// on SUM(B).
+/// The goal of this rule is to improve query performance by eliminating
+/// redundant work.
+///
+/// For example, given a plan that sorts all values where `a > 10`:
+///
+/// ```text
+/// Filter (a > 10)
+/// Sort (a, b)
+/// ```
+///
+/// A better plan is to filter the data *before* the Sort, which sorts fewer
+/// rows and therefore does less work overall:
+///
+/// ```text
+/// Sort (a, b)
+/// Filter (a > 10) <-- filter is moved before the sort
+/// ```
+///
+/// However it is not always possible to push filters down. For example, given
a
+/// plan that finds the top 3 values and then keeps only those that are greater
+/// than 10, if the filter is pushed below the limit it would produce a
+/// different result.
+///
+/// ```text
+/// Filter (a > 10) <-- can not move this Filter before the limit
+/// Limit (fetch=3)
+/// Sort (a, b)
+/// ```
+///
+///
+/// More formally, a filter-commutative operation is an operation `op` that
+/// satisfies `filter(op(data)) = op(filter(data))`.
+///
+/// The filter-commutative property is plan and column-specific. A filter on
`a`
+/// can be pushed through a `Aggregate(group_by = [a], agg=[SUM(b))`. However,
a
+/// a filter on `SUM(b)` can not be pushed through the same aggregate.
+///
+/// # Handling Conjuctions
+///
+/// It is possible to only push down **part** of a filter expression if is
+/// connected with `AND`s (more formally if it is a "conjuction").
+///
+/// For example, given the following plan:
+///
+/// ```text
+/// Filter(a > 10 AND SUM(b) < 5)
+/// Aggregate(group_by = [a], agg = [SUM(b))
+/// ```
+///
+/// The `a > 10` is commutative with the `Aggregate` but `SUM(b) < 5` is not.
+/// Therefoe it is possible to only push part of the expression, resulting in:
Review Comment:
```suggestion
/// Therefore it is possible to only push part of the expression, resulting
in:
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]