[ 
https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297119#comment-15297119
 ] 

Prasanth Jayachandran commented on HIVE-13773:
----------------------------------------------

[~pxiong] I initially added it for ORC writers (not ORC updaters - ACID). ORC 
writers implement the StatsProvidingRecordWriter interface. This interface 
returns the internally gathered stats (row count and raw data size). ACID was 
added later and I guess it does not implement the interface as it cannot 
provide reliable stats (because of deletes). I wanted to make sure this works 
for non-ACID use case. Also, this stats gathering should happen in processOp() 
and closeOp(). The reason for that is, with 
hive.optimize.sort.dynamic.partition there is only one record writer open per 
reducer at any point. Before closing the previous writer in processOp() we need 
to collect the statistics and for the last writer we gather statistics in 
closeOp(). I am not clear why you are removing the stats collection from 
processOp().

> Stats state is not captured correctly in dynpart_sort_optimization_acid.q
> -------------------------------------------------------------------------
>
>                 Key: HIVE-13773
>                 URL: https://issues.apache.org/jira/browse/HIVE-13773
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Pengcheng Xiong
>            Assignee: Pengcheng Xiong
>         Attachments: HIVE-13773.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to