[jira] [Updated] (SPARK-30469) Partition columns should not be involved when calculating sizeInBytes of Project logical plan

Hu Fuwang (Jira) Thu, 09 Jan 2020 01:21:07 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-30469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hu Fuwang updated SPARK-30469:
------------------------------
    Summary: Partition columns should not be involved when calculating 
sizeInBytes of Project logical plan  (was: Hive Partition columns should not be 
involved when calculating sizeInBytes of Project logical plan)

> Partition columns should not be involved when calculating sizeInBytes of 
> Project logical plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-30469
>                 URL: https://issues.apache.org/jira/browse/SPARK-30469
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Hu Fuwang
>            Priority: Major
>
> When getting the statistics of a Project logical plan, if CBO not enabled, 
> Spark will call SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode to calculate 
> the size in bytes, which will compute the ratio of the row size of the 
> project plan and its child plan.
> And the row size is computed based on the output attributes (columns). 
> Currently, SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode involve partition 
> columns of hive table as well, which is not reasonable, because hive 
> partition column actually does not account for sizeInBytes.
> This may make the sizeInBytes not accurate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-30469) Partition columns should not be involved when calculating sizeInBytes of Project logical plan

Reply via email to