[
https://issues.apache.org/jira/browse/SPARK-57437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-57437:
-----------------------------------
Labels: pull-request-available (was: )
> Infer additional constraints by substituting attribute=literal bindings
> -----------------------------------------------------------------------
>
> Key: SPARK-57437
> URL: https://issues.apache.org/jira/browse/SPARK-57437
> Project: Spark
> Issue Type: Improvement
> Components: Optimizer
> Affects Versions: 4.3.0
> Reporter: James Xu
> Priority: Minor
> Labels: pull-request-available
>
> When a join ON clause contains a range predicate that references an attribute
> from the other side, and that attribute is bound to a literal elsewhere in
> the query, Catalyst fails to
> derive a pushable filter for the scan.
> For example:
> ```sql
> SELECT *
> FROM a
> LEFT JOIN b
> ON a.key = b.key
> AND b.pt >= f(a.pt)
> WHERE a.pt = '20260610';
> ```
> Because a.pt = '20260610' is known and the join condition contains b.pt >=
> f(a.pt), the predicate b.pt >= f('20260610') logically holds for any matching
> row from b. However, Catalyst
> does not infer this bound, so the filter cannot be pushed into b's scan.
> As a result, b is fully scanned even though only a small subset of partitions
> could possibly match.
> The same pattern occurs with other range or inequality predicates, such as:
> ```sql
> SELECT *
> FROM a
> JOIN b
> ON a.key = b.key
> AND b.v >= a.k
> AND b.v <= a.k + 10
> WHERE a.k = 5;
> ```
> In this case, any matching row from b must satisfy b.v >= 5 AND b.v <= 15,
> but again Catalyst does not derive these bounds and scans all of b.
> This leads to unnecessary full-table or full-partition scans,
> significantly increasing I/O and execution time for workloads that join a
> large partitioned table with a filtered
> attribute.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]