[
https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780647#comment-13780647
]
Hive QA commented on HIVE-5324:
-------------------------------
{color:green}Overall{color}: +1 all checks pass
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12605161/HIVE-5324.4.patch.txt
{color:green}SUCCESS:{color} +1 3179 tests passed
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/940/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/940/console
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}
This message is automatically generated.
> Extend record writer and ORC reader/writer interfaces to provide statistics
> ---------------------------------------------------------------------------
>
> Key: HIVE-5324
> URL: https://issues.apache.org/jira/browse/HIVE-5324
> Project: Hive
> Issue Type: New Feature
> Affects Versions: 0.13.0
> Reporter: Prasanth J
> Assignee: Prasanth J
> Labels: orcfile, statistics
> Fix For: 0.13.0
>
> Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt,
> HIVE-5324.3.patch.txt, HIVE-5324.4.patch.txt
>
>
> The current implementation for computing statistics (number of rows and raw
> data size) happens for every single row processed. The processOp() method in
> FileSinkOperator gets raw data size for each row from the serde and
> accumulates the size in hashmap while counting the number of rows. This
> accumulated statistics is then published to metastore.
> In case of ORC, ORC already stores enough statistics internally which can be
> made use of when publishing the stats to metastore. This will avoid the
> duplication of work that is happening in the processOp(). Also getting the
> statistics directly from ORC is very cheap (can directly read from the file
> footer).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira