Hu Fuwang created SPARK-30469:
---------------------------------

             Summary: Hive Partition columns should not be involved when 
calculating sizeInBytes of Project logical plan
                 Key: SPARK-30469
                 URL: https://issues.apache.org/jira/browse/SPARK-30469
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Hu Fuwang


When getting the statistics of a Project logical plan, if CBO not enabled, 
Spark will call SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode to calculate the 
size in bytes, which will compute the ratio of the row size of the project plan 
and its child plan.

And the row size is computed based on the out attributes (columns). Currently, 
SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode involve partition columns of 
hive table as well, which is not reasonable, because hive partition column 
actually does not account for sizeInBytes.

This may make the sizeInBytes not accurate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to