[ 
https://issues.apache.org/jira/browse/SPARK-57437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-57437:
-----------------------------------
    Labels: pull-request-available  (was: )

> Infer additional constraints by substituting attribute=literal bindings
> -----------------------------------------------------------------------
>
>                 Key: SPARK-57437
>                 URL: https://issues.apache.org/jira/browse/SPARK-57437
>             Project: Spark
>          Issue Type: Improvement
>          Components: Optimizer
>    Affects Versions: 4.3.0
>            Reporter: James Xu
>            Priority: Minor
>              Labels: pull-request-available
>
> When a join ON clause contains a range predicate that references an attribute 
> from the other side, and that attribute is bound to a literal elsewhere in 
> the query, Catalyst fails to
>    derive a pushable filter for the scan.
>    For example:
>    ```sql
>      SELECT *
>      FROM a
>      LEFT JOIN b
>        ON a.key = b.key
>       AND b.pt >= f(a.pt)
>      WHERE a.pt = '20260610';
>    ```
>    Because a.pt = '20260610' is known and the join condition contains b.pt >= 
> f(a.pt), the predicate b.pt >= f('20260610') logically holds for any matching 
> row from b. However, Catalyst
>    does not infer this bound, so the filter cannot be pushed into b's scan. 
> As a result, b is fully scanned even though only a small subset of partitions 
> could possibly match.
>    The same pattern occurs with other range or inequality predicates, such as:
>    ```sql
>      SELECT *
>      FROM a
>      JOIN b
>        ON a.key = b.key
>       AND b.v >= a.k
>       AND b.v <= a.k + 10
>      WHERE a.k = 5;
>    ```
>    In this case, any matching row from b must satisfy b.v >= 5 AND b.v <= 15, 
> but again Catalyst does not derive these bounds and scans all of b.
>    This leads to unnecessary full-table or full-partition scans, 
> significantly increasing I/O and execution time for workloads that join a 
> large partitioned table with a filtered
>    attribute.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to