Rajesh Balamohan created HIVE-14828:
---------------------------------------
Summary: Cloud/S3: Stats publishing should be on HDFS instead of S3
Key: HIVE-14828
URL: https://issues.apache.org/jira/browse/HIVE-14828
Project: Hive
Issue Type: Improvement
Components: Statistics
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Minor
Currently, stats files are created in S3. Later as a part of FSStatsAggregator,
it reads this file and populates MS again.
{noformat}
2016-09-23 05:57:46,772 INFO [main]: fs.FSStatsPublisher
(FSStatsPublisher.java:init(49)) - created :
s3a://BUCKET/test/.hive-staging_hive_2016-09-23_05-57-34_309_2648485988937054815-1/-ext-10001
2016-09-23 05:57:46,773 DEBUG [main]: fs.FSStatsAggregator
(FSStatsAggregator.java:connect(53)) - About to read stats from :
s3a://BUCKET/test/.hive-staging_hive_2016-09-23_05-57-34_309_2648485988937054815-1/-ext-10001
{noformat}
Instead of this, stats can be written directly on to HDFS and read locally
instead of S3, which would help in reducing couple of calls to S3.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)