[ 
https://issues.apache.org/jira/browse/HIVE-12309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999307#comment-14999307
 ] 

Prasanth Jayachandran commented on HIVE-12309:
----------------------------------------------

Left a minor comment in RB. I am worried about the scenario of INCOMPLETE 
column stats. What happens if column stats is missing or stale? raw data size 
will always be updated (if the appropriate configs are on and if the fileformat 
supports it), but column stats freshness is not guaranteed. How do we deal with 
it in the estimation?

> TableScan should use column stats when available for better data size estimate
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-12309
>                 URL: https://issues.apache.org/jira/browse/HIVE-12309
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>         Attachments: HIVE-12309.2.patch, HIVE-12309.patch
>
>
> Currently, all other operators use column stats to figure out data size, 
> whereas TableScan relies on rawDataSize. This inconsistency can result in an 
> inconsistency where TS may have lower Datasize then subsequent operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to