[
https://issues.apache.org/jira/browse/SPARK-34017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259423#comment-17259423
]
Ted Yu commented on SPARK-34017:
--------------------------------
For PushDownUtils#pruneColumns, I am experimenting with the following:
{code}
case r: SupportsPushDownRequiredColumns if
SQLConf.get.nestedSchemaPruningEnabled =>
val JSONCapture = "get_json_object\\((.*), *(.*)\\)".r
var jsonRootFields : ArrayBuffer[RootField] = ArrayBuffer()
projects.map{ _.map{ f => f.toString match {
case JSONCapture(column, field) =>
jsonRootFields += RootField(StructField(column, f.dataType,
f.nullable),
derivedFromAtt = false, prunedIfAnyChildAccessed = true)
case _ => logDebug("else " + f)
}}}
val rootFields = SchemaPruning.identifyRootFields(projects, filters) ++
jsonRootFields
{code}
> Pass json column information via pruneColumns()
> -----------------------------------------------
>
> Key: SPARK-34017
> URL: https://issues.apache.org/jira/browse/SPARK-34017
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.1
> Reporter: Ted Yu
> Priority: Major
>
> Currently PushDownUtils#pruneColumns only passes root fields to
> SupportsPushDownRequiredColumns implementation(s).
> {code}
> 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG -
> org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema
> projection List(id#33, address#34, phone#36, get_json_object(phone#36,
> $.code) AS get_json_object(phone, $.code)#37)
> 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG -
> org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema
> StructType(StructField(id,IntegerType,false),
> StructField(address,StringType,true), StructField(phone,StringType,true))
> {code}
> The first line shows projections and the second line shows the pruned schema.
> We can see that get_json_object(phone#36, $.code) is filtered. This
> expression retrieves field 'code' from phone json column.
> We should allow json column information to be passed via pruneColumns().
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]