[jira] [Updated] (SPARK-40608) `V2ScanRelationPushDown` supports extracting predicates in the output set

wang-zhun (Jira) Thu, 29 Sep 2022 02:56:04 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-40608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


wang-zhun updated SPARK-40608:
------------------------------
    Description: 
Currently does not push down `(ID = 20220921) OR (ID = 20220910)`, if `ID` is a 
partition field, it makes sense to implement push down
{code:java}
SELECT * FROM h2.test.people WHERE ID = 20220921 OR (ID =20220910 and 
split(NAME, ',')[0] == 'fred') {code}
h3. before
{code:java}
== Analyzed Logical Plan ==
NAME: string, ID: int
Project [NAME#5, ID#6]
+- Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0] 
= fred)))
   +- SubqueryAlias h2.test.people
         +- RelationV2[NAME#5, ID#6] h2.test.people test.people

== Optimized Logical Plan ==
Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0] = 
fred)))
+- RelationV2[NAME#5, ID#6] test.people

== Physical Plan ==
*(1) Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, 
-1)[0] = fred)))
+- *(1) Scan 
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$1@56c42964 
[NAME#5,ID#6] PushedFilters: [], ReadSchema: struct<NAME:string,ID:int> {code}
h3. after
{code:java}
== Analyzed Logical Plan ==
NAME: string, ID: int
Project [NAME#5, ID#6]
+- Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0] 
= fred)))
   +- SubqueryAlias h2.test.people
         +- RelationV2[NAME#5, ID#6] h2.test.people test.people 

== Optimized Logical Plan ==
Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0] = 
fred)))
+- RelationV2[NAME#5, ID#6] test.people

== Physical Plan ==
*(1) Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, 
-1)[0] = fred)))
+- *(1) Scan 
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$1@56c42964 
[NAME#5,ID#6] PushedFilters: [(ID = 20220921) OR (ID = 20220910)], ReadSchema: 
struct<NAME:string,ID:int>{code}

  was:
 

Currently it is not possible to push down `(ID = 20220921) OR (ID = 20220910)`, 
if `ID` is a partition field, it makes sense to implement push down
{code:java}
SELECT * FROM h2.test.people WHERE ID = 20220921 OR (ID =20220910 and 
split(NAME, ',')[0] == 'fred') {code}
h3. before

 

 
{code:java}
== Analyzed Logical Plan ==
NAME: string, ID: int
Project [NAME#5, ID#6]
+- Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0] 
= fred)))
   +- SubqueryAlias h2.test.people
         +- RelationV2[NAME#5, ID#6] h2.test.people test.people

== Optimized Logical Plan ==
Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0] = 
fred)))
+- RelationV2[NAME#5, ID#6] test.people

== Physical Plan ==
*(1) Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, 
-1)[0] = fred)))
+- *(1) Scan 
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$1@56c42964 
[NAME#5,ID#6] PushedFilters: [], ReadSchema: struct<NAME:string,ID:int> {code}
h3. after

 
{code:java}
== Analyzed Logical Plan ==
NAME: string, ID: int
Project [NAME#5, ID#6]
+- Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0] 
= fred)))
   +- SubqueryAlias h2.test.people
         +- RelationV2[NAME#5, ID#6] h2.test.people test.people 

== Optimized Logical Plan ==
Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0] = 
fred)))
+- RelationV2[NAME#5, ID#6] test.people

== Physical Plan ==
*(1) Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, 
-1)[0] = fred)))
+- *(1) Scan 
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$1@56c42964 
[NAME#5,ID#6] PushedFilters: [(ID = 20220921) OR (ID = 20220910)], ReadSchema: 
struct<NAME:string,ID:int>{code}
 

 

 


> `V2ScanRelationPushDown` supports extracting predicates in the output set
> -------------------------------------------------------------------------
>
>                 Key: SPARK-40608
>                 URL: https://issues.apache.org/jira/browse/SPARK-40608
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: wang-zhun
>            Priority: Major
>
> Currently does not push down `(ID = 20220921) OR (ID = 20220910)`, if `ID` is 
> a partition field, it makes sense to implement push down
> {code:java}
> SELECT * FROM h2.test.people WHERE ID = 20220921 OR (ID =20220910 and 
> split(NAME, ',')[0] == 'fred') {code}
> h3. before
> {code:java}
> == Analyzed Logical Plan ==
> NAME: string, ID: int
> Project [NAME#5, ID#6]
> +- Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, 
> -1)[0] = fred)))
>    +- SubqueryAlias h2.test.people
>          +- RelationV2[NAME#5, ID#6] h2.test.people test.people
> == Optimized Logical Plan ==
> Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0] 
> = fred)))
> +- RelationV2[NAME#5, ID#6] test.people
> == Physical Plan ==
> *(1) Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, 
> -1)[0] = fred)))
> +- *(1) Scan 
> org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$1@56c42964 
> [NAME#5,ID#6] PushedFilters: [], ReadSchema: struct<NAME:string,ID:int> {code}
> h3. after
> {code:java}
> == Analyzed Logical Plan ==
> NAME: string, ID: int
> Project [NAME#5, ID#6]
> +- Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, 
> -1)[0] = fred)))
>    +- SubqueryAlias h2.test.people
>          +- RelationV2[NAME#5, ID#6] h2.test.people test.people 
> == Optimized Logical Plan ==
> Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0] 
> = fred)))
> +- RelationV2[NAME#5, ID#6] test.people
> == Physical Plan ==
> *(1) Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, 
> -1)[0] = fred)))
> +- *(1) Scan 
> org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$1@56c42964 
> [NAME#5,ID#6] PushedFilters: [(ID = 20220921) OR (ID = 20220910)], 
> ReadSchema: struct<NAME:string,ID:int>{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-40608) `V2ScanRelationPushDown` supports extracting predicates in the output set

Reply via email to