Hu Fuwang created SPARK-30469:
---------------------------------
Summary: Hive Partition columns should not be involved when
calculating sizeInBytes of Project logical plan
Key: SPARK-30469
URL: https://issues.apache.org/jira/browse/SPARK-30469
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.0.0
Reporter: Hu Fuwang
When getting the statistics of a Project logical plan, if CBO not enabled,
Spark will call SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode to calculate the
size in bytes, which will compute the ratio of the row size of the project plan
and its child plan.
And the row size is computed based on the out attributes (columns). Currently,
SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode involve partition columns of
hive table as well, which is not reasonable, because hive partition column
actually does not account for sizeInBytes.
This may make the sizeInBytes not accurate.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]