[
https://issues.apache.org/jira/browse/SPARK-40608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wang-zhun updated SPARK-40608:
------------------------------
Description:
Currently does not push down `(ID = 20220921) OR (ID = 20220910)`, if `ID` is a
partition field, it makes sense to implement push down
{code:java}
SELECT * FROM h2.test.people WHERE ID = 20220921 OR (ID =20220910 and
split(NAME, ',')[0] == 'fred') {code}
h3. before
{code:java}
== Analyzed Logical Plan ==
NAME: string, ID: int
Project [NAME#5, ID#6]
+- Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0]
= fred)))
+- SubqueryAlias h2.test.people
+- RelationV2[NAME#5, ID#6] h2.test.people test.people
== Optimized Logical Plan ==
Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0] =
fred)))
+- RelationV2[NAME#5, ID#6] test.people
== Physical Plan ==
*(1) Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,,
-1)[0] = fred)))
+- *(1) Scan
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$1@56c42964
[NAME#5,ID#6] PushedFilters: [], ReadSchema: struct<NAME:string,ID:int> {code}
h3. after
{code:java}
== Analyzed Logical Plan ==
NAME: string, ID: int
Project [NAME#5, ID#6]
+- Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0]
= fred)))
+- SubqueryAlias h2.test.people
+- RelationV2[NAME#5, ID#6] h2.test.people test.people
== Optimized Logical Plan ==
Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0] =
fred)))
+- RelationV2[NAME#5, ID#6] test.people
== Physical Plan ==
*(1) Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,,
-1)[0] = fred)))
+- *(1) Scan
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$1@56c42964
[NAME#5,ID#6] PushedFilters: [(ID = 20220921) OR (ID = 20220910)], ReadSchema:
struct<NAME:string,ID:int>{code}
was:
Currently it is not possible to push down `(ID = 20220921) OR (ID = 20220910)`,
if `ID` is a partition field, it makes sense to implement push down
{code:java}
SELECT * FROM h2.test.people WHERE ID = 20220921 OR (ID =20220910 and
split(NAME, ',')[0] == 'fred') {code}
h3. before
{code:java}
== Analyzed Logical Plan ==
NAME: string, ID: int
Project [NAME#5, ID#6]
+- Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0]
= fred)))
+- SubqueryAlias h2.test.people
+- RelationV2[NAME#5, ID#6] h2.test.people test.people
== Optimized Logical Plan ==
Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0] =
fred)))
+- RelationV2[NAME#5, ID#6] test.people
== Physical Plan ==
*(1) Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,,
-1)[0] = fred)))
+- *(1) Scan
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$1@56c42964
[NAME#5,ID#6] PushedFilters: [], ReadSchema: struct<NAME:string,ID:int> {code}
h3. after
{code:java}
== Analyzed Logical Plan ==
NAME: string, ID: int
Project [NAME#5, ID#6]
+- Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0]
= fred)))
+- SubqueryAlias h2.test.people
+- RelationV2[NAME#5, ID#6] h2.test.people test.people
== Optimized Logical Plan ==
Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0] =
fred)))
+- RelationV2[NAME#5, ID#6] test.people
== Physical Plan ==
*(1) Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,,
-1)[0] = fred)))
+- *(1) Scan
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$1@56c42964
[NAME#5,ID#6] PushedFilters: [(ID = 20220921) OR (ID = 20220910)], ReadSchema:
struct<NAME:string,ID:int>{code}
> `V2ScanRelationPushDown` supports extracting predicates in the output set
> -------------------------------------------------------------------------
>
> Key: SPARK-40608
> URL: https://issues.apache.org/jira/browse/SPARK-40608
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.3.0
> Reporter: wang-zhun
> Priority: Major
>
> Currently does not push down `(ID = 20220921) OR (ID = 20220910)`, if `ID` is
> a partition field, it makes sense to implement push down
> {code:java}
> SELECT * FROM h2.test.people WHERE ID = 20220921 OR (ID =20220910 and
> split(NAME, ',')[0] == 'fred') {code}
> h3. before
> {code:java}
> == Analyzed Logical Plan ==
> NAME: string, ID: int
> Project [NAME#5, ID#6]
> +- Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,,
> -1)[0] = fred)))
> +- SubqueryAlias h2.test.people
> +- RelationV2[NAME#5, ID#6] h2.test.people test.people
> == Optimized Logical Plan ==
> Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0]
> = fred)))
> +- RelationV2[NAME#5, ID#6] test.people
> == Physical Plan ==
> *(1) Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,,
> -1)[0] = fred)))
> +- *(1) Scan
> org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$1@56c42964
> [NAME#5,ID#6] PushedFilters: [], ReadSchema: struct<NAME:string,ID:int> {code}
> h3. after
> {code:java}
> == Analyzed Logical Plan ==
> NAME: string, ID: int
> Project [NAME#5, ID#6]
> +- Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,,
> -1)[0] = fred)))
> +- SubqueryAlias h2.test.people
> +- RelationV2[NAME#5, ID#6] h2.test.people test.people
> == Optimized Logical Plan ==
> Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,, -1)[0]
> = fred)))
> +- RelationV2[NAME#5, ID#6] test.people
> == Physical Plan ==
> *(1) Filter ((ID#6 = 20220921) OR ((ID#6 = 20220910) AND (split(NAME#5, ,,
> -1)[0] = fred)))
> +- *(1) Scan
> org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$1@56c42964
> [NAME#5,ID#6] PushedFilters: [(ID = 20220921) OR (ID = 20220910)],
> ReadSchema: struct<NAME:string,ID:int>{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]