[jira] [Created] (SPARK-40317) Improvement to JDBC predicate for queries involving joins

David Ahern (Jira) Fri, 02 Sep 2022 11:15:03 -0700

David Ahern created SPARK-40317:
-----------------------------------

             Summary: Improvement to JDBC predicate for queries involving joins
                 Key: SPARK-40317
                 URL: https://issues.apache.org/jira/browse/SPARK-40317
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.2.2
            Reporter: David Ahern



Current behaviour on tables involving joins seems to use a subquery as follows

 

select * from

(

select a, b, c from tbl1

lj tbl2 on tbl1.col1 = tbl2.col1

lj tbl3 on tbl1.col2 = tbl3.col2

)

where predicate = 1

where predicate = 2

where predicate = 3

 

More desirable would be

(

select a, b, c from tbl1 where (predicate = 1, predicate = 2, etc)

lj tbl2 on tbl1.col1 = tbl2.col1

lj tbl3 on tbl1.col2 = tbl3.col2

)

 

to just do the join on the subset of data rather than joining all data then 
filtering.  Predicate pushdown usually only works on columns that have been 
indexes.  So even if the data isn't indexed, this would reduce amount of data 
needing to be moved.  In many cases better to do the join on DB side than 
pulling everything into Spark.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40317) Improvement to JDBC predicate for queries involving joins

Reply via email to