[
https://issues.apache.org/jira/browse/SPARK-33300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17225089#comment-17225089
]
chendihao commented on SPARK-33300:
-----------------------------------
Great and thanks [~EveLiao] . I'm not familiar with Catalyst optimizer but it
should recursively run the rule in child expressions. It's easy to reproduce in
Spark 3.0 and please let me know if you need any help.
> Rule SimplifyCasts will not work for nested columns
> ---------------------------------------------------
>
> Key: SPARK-33300
> URL: https://issues.apache.org/jira/browse/SPARK-33300
> Project: Spark
> Issue Type: Bug
> Components: Optimizer, SQL
> Affects Versions: 3.0.0
> Reporter: chendihao
> Priority: Minor
>
> We use SparkSQL and Catalyst to optimize the Spark job. We have read the
> source code and test the rule of SimplifyCasts which will work for simple SQL
> without nested cast.
> The SQL "select cast(string_date as string) from t1" will be optimized.
> {code:java}
> == Analyzed Logical Plan ==
> string_date: string
> Project [cast(string_date#12 as string) AS string_date#24]
> +- SubqueryAlias t1
> +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12,
> string_timestamp#13, timestamp_field#14, bool_field#15], false
> == Optimized Logical Plan ==
> Project [string_date#12]
> +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12,
> string_timestamp#13, timestamp_field#14, bool_field#15], false
> {code}
> However, it fail to optimize with the nested cast like this "select
> cast(cast(string_date as string) as string) from t1".
> {code:java}
> == Analyzed Logical Plan ==
> CAST(CAST(string_date AS STRING) AS STRING): string
> Project [cast(cast(string_date#12 as string) as string) AS
> CAST(CAST(string_date AS STRING) AS STRING)#24]
> +- SubqueryAlias t1
> +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12,
> string_timestamp#13, timestamp_field#14, bool_field#15], false
> == Optimized Logical Plan ==
> Project [string_date#12 AS CAST(CAST(string_date AS STRING) AS STRING)#24]
> +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12,
> string_timestamp#13, timestamp_field#14, bool_field#15], false
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]