David Ahern created SPARK-40317: ----------------------------------- Summary: Improvement to JDBC predicate for queries involving joins Key: SPARK-40317 URL: https://issues.apache.org/jira/browse/SPARK-40317 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.2 Reporter: David Ahern
Current behaviour on tables involving joins seems to use a subquery as follows select * from ( select a, b, c from tbl1 lj tbl2 on tbl1.col1 = tbl2.col1 lj tbl3 on tbl1.col2 = tbl3.col2 ) where predicate = 1 where predicate = 2 where predicate = 3 More desirable would be ( select a, b, c from tbl1 where (predicate = 1, predicate = 2, etc) lj tbl2 on tbl1.col1 = tbl2.col1 lj tbl3 on tbl1.col2 = tbl3.col2 ) to just do the join on the subset of data rather than joining all data then filtering. Predicate pushdown usually only works on columns that have been indexes. So even if the data isn't indexed, this would reduce amount of data needing to be moved. In many cases better to do the join on DB side than pulling everything into Spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org