[ https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776082#comment-13776082 ]
Prasanth J commented on HIVE-5324: ---------------------------------- [~ashutoshc] addressed your review comments in this patch.. | Also you need instance of check on outWriter for SerDeStats stats = ((StatsProvidingRecordWriter) outWriter).getStats(); otherwise this will throw ClassCastException for writers not implementing the interface. This will not happen as the boolean flag in the if condition will be set only if writer is instance of StatasProvidingRecordWriter. Making the patch available for HiveQA to pick up this patch. > Extend record writer and ORC reader/writer interfaces to provide statistics > --------------------------------------------------------------------------- > > Key: HIVE-5324 > URL: https://issues.apache.org/jira/browse/HIVE-5324 > Project: Hive > Issue Type: New Feature > Affects Versions: 0.13.0 > Reporter: Prasanth J > Assignee: Prasanth J > Labels: orcfile, statistics > Fix For: 0.13.0 > > Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt, > HIVE-5324.3.patch.txt > > > The current implementation for computing statistics (number of rows and raw > data size) happens for every single row processed. The processOp() method in > FileSinkOperator gets raw data size for each row from the serde and > accumulates the size in hashmap while counting the number of rows. This > accumulated statistics is then published to metastore. > In case of ORC, ORC already stores enough statistics internally which can be > made use of when publishing the stats to metastore. This will avoid the > duplication of work that is happening in the processOp(). Also getting the > statistics directly from ORC is very cheap (can directly read from the file > footer). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira