[
https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297119#comment-15297119
]
Prasanth Jayachandran commented on HIVE-13773:
----------------------------------------------
[~pxiong] I initially added it for ORC writers (not ORC updaters - ACID). ORC
writers implement the StatsProvidingRecordWriter interface. This interface
returns the internally gathered stats (row count and raw data size). ACID was
added later and I guess it does not implement the interface as it cannot
provide reliable stats (because of deletes). I wanted to make sure this works
for non-ACID use case. Also, this stats gathering should happen in processOp()
and closeOp(). The reason for that is, with
hive.optimize.sort.dynamic.partition there is only one record writer open per
reducer at any point. Before closing the previous writer in processOp() we need
to collect the statistics and for the last writer we gather statistics in
closeOp(). I am not clear why you are removing the stats collection from
processOp().
> Stats state is not captured correctly in dynpart_sort_optimization_acid.q
> -------------------------------------------------------------------------
>
> Key: HIVE-13773
> URL: https://issues.apache.org/jira/browse/HIVE-13773
> Project: Hive
> Issue Type: Sub-task
> Reporter: Pengcheng Xiong
> Assignee: Pengcheng Xiong
> Attachments: HIVE-13773.01.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)