[ https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780647#comment-13780647 ]
Hive QA commented on HIVE-5324: ------------------------------- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12605161/HIVE-5324.4.patch.txt {color:green}SUCCESS:{color} +1 3179 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/940/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/940/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. > Extend record writer and ORC reader/writer interfaces to provide statistics > --------------------------------------------------------------------------- > > Key: HIVE-5324 > URL: https://issues.apache.org/jira/browse/HIVE-5324 > Project: Hive > Issue Type: New Feature > Affects Versions: 0.13.0 > Reporter: Prasanth J > Assignee: Prasanth J > Labels: orcfile, statistics > Fix For: 0.13.0 > > Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt, > HIVE-5324.3.patch.txt, HIVE-5324.4.patch.txt > > > The current implementation for computing statistics (number of rows and raw > data size) happens for every single row processed. The processOp() method in > FileSinkOperator gets raw data size for each row from the serde and > accumulates the size in hashmap while counting the number of rows. This > accumulated statistics is then published to metastore. > In case of ORC, ORC already stores enough statistics internally which can be > made use of when publishing the stats to metastore. This will avoid the > duplication of work that is happening in the processOp(). Also getting the > statistics directly from ORC is very cheap (can directly read from the file > footer). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira