[jira] [Updated] (HIVE-14828) Cloud/S3: Stats publishing should be on HDFS instead of S3

Rajesh Balamohan (JIRA) Wed, 19 Oct 2016 17:19:09 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-14828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rajesh Balamohan updated HIVE-14828:
------------------------------------
    Attachment: HIVE-14828.1.patch

it is better to have use hdfs for stats file. rebasing for master branch.

> Cloud/S3: Stats publishing should be on HDFS instead of S3
> ----------------------------------------------------------
>
>                 Key: HIVE-14828
>                 URL: https://issues.apache.org/jira/browse/HIVE-14828
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>    Affects Versions: 1.2.0
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>             Fix For: 1.3.0
>
>         Attachments: HIVE-14828.1.patch, HIVE-14828.branch-1.2.001.patch, 
> HIVE-14828.branch-2.0.001.patch
>
>
> Currently, stats files are created in S3. Later as a part of 
> FSStatsAggregator, it reads this file and populates MS again.
> {noformat}
> 2016-09-23 05:57:46,772 INFO  [main]: fs.FSStatsPublisher 
> (FSStatsPublisher.java:init(49)) - created : 
> s3a://BUCKET/test/.hive-staging_hive_2016-09-23_05-57-34_309_2648485988937054815-1/-ext-10001
> 2016-09-23 05:57:46,773 DEBUG [main]: fs.FSStatsAggregator 
> (FSStatsAggregator.java:connect(53)) - About to read stats from : 
> s3a://BUCKET/test/.hive-staging_hive_2016-09-23_05-57-34_309_2648485988937054815-1/-ext-10001
> {noformat}
> Instead of this, stats can be written directly on to HDFS and read locally 
> instead of S3, which would help in reducing couple of calls to S3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14828) Cloud/S3: Stats publishing should be on HDFS instead of S3

Reply via email to