[jira] [Updated] (SPARK-34017) Pass json column information via pruneColumns()

Ted Yu (Jira) Tue, 05 Jan 2021 14:23:16 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-34017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ted Yu updated SPARK-34017:
---------------------------
    Description: 
Currently PushDownUtils#pruneColumns only passes root fields to 
SupportsPushDownRequiredColumns implementation(s).
{code}
2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) 
AS get_json_object(phone, $.code)#37)
2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
StructType(StructField(id,IntegerType,false), 
StructField(address,StringType,true), StructField(phone,StringType,true))
{code}
The first line shows projections and the second line shows the pruned schema.

We can see that get_json_object(phone#36, $.code) is filtered. This expression 
retrieves field 'code' from phone json column.

We should allow json column information to be passed via pruneColumns().

  was:

Currently PushDownUtils#pruneColumns only passes root fields to 
SupportsPushDownRequiredColumns implementation(s).

2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) 
AS get_json_object(phone, $.code)#37)
2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
StructType(StructField(id,IntegerType,false), 
StructField(address,StringType,true), StructField(phone,StringType,true))

The first line shows projections and the second line shows the pruned schema.

We can see that get_json_object(phone#36, $.code) is filtered. This expression 
retrieves field 'code' from phone json column.

We should allow json column information to be passed via pruneColumns().


> Pass json column information via pruneColumns()
> -----------------------------------------------
>
>                 Key: SPARK-34017
>                 URL: https://issues.apache.org/jira/browse/SPARK-34017
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.1
>            Reporter: Ted Yu
>            Priority: Major
>
> Currently PushDownUtils#pruneColumns only passes root fields to 
> SupportsPushDownRequiredColumns implementation(s).
> {code}
> 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - 
> org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
> projection List(id#33, address#34, phone#36, get_json_object(phone#36, 
> $.code) AS get_json_object(phone, $.code)#37)
> 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - 
> org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
> StructType(StructField(id,IntegerType,false), 
> StructField(address,StringType,true), StructField(phone,StringType,true))
> {code}
> The first line shows projections and the second line shows the pruned schema.
> We can see that get_json_object(phone#36, $.code) is filtered. This 
> expression retrieves field 'code' from phone json column.
> We should allow json column information to be passed via pruneColumns().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-34017) Pass json column information via pruneColumns()

Reply via email to