[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column

Ted Yu (Jira) Thu, 31 Dec 2020 07:17:08 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255389#comment-17255389
 ]


Ted Yu edited comment on SPARK-33915 at 12/31/20, 3:16 PM:
-----------------------------------------------------------

Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- Filter (get_json_object(phone#37, $.code) = 1200)
      +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: []
 - Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
    +- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) 
AS phone#33]
       +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
     - Cassandra Filters: [[phone->'code' = ?, 1200]]
     - Requested Columns: [id,address,phone]

{code}


was (Author: [email protected]):
Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- Filter (get_json_object(phone#37, $.phone) = 1200)
      +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: []
 - Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
    +- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) 
AS phone#33]
       +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
     - Cassandra Filters: [[phone->'phone' = ?, 1200]]
     - Requested Columns: [id,address,phone]

{code}

> Allow json expression to be pushable column
> -------------------------------------------
>
>                 Key: SPARK-33915
>                 URL: https://issues.apache.org/jira/browse/SPARK-33915
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.1
>            Reporter: Ted Yu
>            Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column

Reply via email to