[
https://issues.apache.org/jira/browse/SPARK-30469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hu Fuwang updated SPARK-30469:
------------------------------
Summary: Partition columns should not be involved when calculating
sizeInBytes of Project logical plan (was: Hive Partition columns should not be
involved when calculating sizeInBytes of Project logical plan)
> Partition columns should not be involved when calculating sizeInBytes of
> Project logical plan
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-30469
> URL: https://issues.apache.org/jira/browse/SPARK-30469
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Hu Fuwang
> Priority: Major
>
> When getting the statistics of a Project logical plan, if CBO not enabled,
> Spark will call SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode to calculate
> the size in bytes, which will compute the ratio of the row size of the
> project plan and its child plan.
> And the row size is computed based on the output attributes (columns).
> Currently, SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode involve partition
> columns of hive table as well, which is not reasonable, because hive
> partition column actually does not account for sizeInBytes.
> This may make the sizeInBytes not accurate.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]