[
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255389#comment-17255389
]
Ted Yu edited comment on SPARK-33915 at 12/31/20, 3:16 PM:
-----------------------------------------------------------
Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG -
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS
phone#33]
+- Filter (get_json_object(phone#37, $.code) = 1200)
+- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
- Cassandra Filters: []
- Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG -
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code)
AS phone#33]
+- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
- Cassandra Filters: [[phone->'code' = ?, 1200]]
- Requested Columns: [id,address,phone]
{code}
was (Author: [email protected]):
Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG -
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS
phone#33]
+- Filter (get_json_object(phone#37, $.phone) = 1200)
+- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
- Cassandra Filters: []
- Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG -
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code)
AS phone#33]
+- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
- Cassandra Filters: [[phone->'phone' = ?, 1200]]
- Requested Columns: [id,address,phone]
{code}
> Allow json expression to be pushable column
> -------------------------------------------
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 3.0.1
> Reporter: Ted Yu
> Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a
> chance to perform pushdown even if third party DB engine supports json
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which
> would allow json expression to be recognized as pushable column.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]