[ 
https://issues.apache.org/jira/browse/HIVE-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6578:
-----------------------------

    Attachment: HIVE-6578.3.patch

Addressed [~sershe]'s review comments. Had offline discussion with 
[~owen.omalley] . Addressed Owen's comment as well. Following are the changes 
in this patch
1) Added new StatsProvidingRecordReader (similar to StatsProvidingRecordWriter) 
interface which will be used to get stats from the  record reader. This works 
even if partition contains data written with different file format.
2) Removed ORC specific references from StatsNoJobTask. It should not work with 
any file formats that implement StatsProvidingRecordReader interface.

> Use ORC file footer statistics through StatsProvidingRecordReader interface 
> for analyze command
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-6578
>                 URL: https://issues.apache.org/jira/browse/HIVE-6578
>             Project: Hive
>          Issue Type: New Feature
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: orcfile
>         Attachments: HIVE-6578.1.patch, HIVE-6578.2.patch, HIVE-6578.3.patch
>
>
> ORC provides file level statistics which can be used in analyze partialscan 
> and noscan cases to compute basic statistics like number of rows, number of 
> files, total file size and raw data size. On the writer side, a new interface 
> was added earlier (StatsProvidingRecordWriter) that exposed stats when 
> writing a table. Similarly, a new interface StatsProvidingRecordReader can be 
> added which when implemented should provide stats that are gathered by the 
> underlying file format.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to