Ted Yu created SPARK-34017:
------------------------------
Summary: Pass json column information via pruneColumns()
Key: SPARK-34017
URL: https://issues.apache.org/jira/browse/SPARK-34017
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.0.1
Reporter: Ted Yu
Currently PushDownUtils#pruneColumns only passes root fields to
SupportsPushDownRequiredColumns implementation(s).
2021-01-05 19:36:07,437 (Time-limited test) [DEBUG -
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema
projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code)
AS get_json_object(phone, $.code)#37)
2021-01-05 19:36:07,438 (Time-limited test) [DEBUG -
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema
StructType(StructField(id,IntegerType,false),
StructField(address,StringType,true), StructField(phone,StringType,true))
The first line shows projections and the second line shows the pruned schema.
We can see that get_json_object(phone#36, $.code) is filtered. This expression
retrieves field 'code' from phone json column.
We should allow json column information to be passed via pruneColumns().
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]