[
https://issues.apache.org/jira/browse/SPARK-34017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Yu updated SPARK-34017:
---------------------------
Description:
Currently PushDownUtils#pruneColumns only passes root fields to
SupportsPushDownRequiredColumns implementation(s).
{code}
2021-01-05 19:36:07,437 (Time-limited test) [DEBUG -
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema
projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code)
AS get_json_object(phone, $.code)#37)
2021-01-05 19:36:07,438 (Time-limited test) [DEBUG -
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema
StructType(StructField(id,IntegerType,false),
StructField(address,StringType,true), StructField(phone,StringType,true))
{code}
The first line shows projections and the second line shows the pruned schema.
We can see that get_json_object(phone#36, $.code) is filtered. This expression
retrieves field 'code' from phone json column.
We should allow json column information to be passed via pruneColumns().
was:
Currently PushDownUtils#pruneColumns only passes root fields to
SupportsPushDownRequiredColumns implementation(s).
2021-01-05 19:36:07,437 (Time-limited test) [DEBUG -
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema
projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code)
AS get_json_object(phone, $.code)#37)
2021-01-05 19:36:07,438 (Time-limited test) [DEBUG -
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema
StructType(StructField(id,IntegerType,false),
StructField(address,StringType,true), StructField(phone,StringType,true))
The first line shows projections and the second line shows the pruned schema.
We can see that get_json_object(phone#36, $.code) is filtered. This expression
retrieves field 'code' from phone json column.
We should allow json column information to be passed via pruneColumns().
> Pass json column information via pruneColumns()
> -----------------------------------------------
>
> Key: SPARK-34017
> URL: https://issues.apache.org/jira/browse/SPARK-34017
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.1
> Reporter: Ted Yu
> Priority: Major
>
> Currently PushDownUtils#pruneColumns only passes root fields to
> SupportsPushDownRequiredColumns implementation(s).
> {code}
> 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG -
> org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema
> projection List(id#33, address#34, phone#36, get_json_object(phone#36,
> $.code) AS get_json_object(phone, $.code)#37)
> 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG -
> org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema
> StructType(StructField(id,IntegerType,false),
> StructField(address,StringType,true), StructField(phone,StringType,true))
> {code}
> The first line shows projections and the second line shows the pruned schema.
> We can see that get_json_object(phone#36, $.code) is filtered. This
> expression retrieves field 'code' from phone json column.
> We should allow json column information to be passed via pruneColumns().
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]