[
https://issues.apache.org/jira/browse/HADOOP-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206107#comment-15206107
]
Steve Loughran commented on HADOOP-12949:
-----------------------------------------
There's actually some metrics collection in openstack swift; look under
{{org.apache.hadoop.fs.swift.util.DurationStats}} ; they log primarily to
stdout, list min, max, (moving) arithmetic mean, stddev,, by HTTP verb.
# It's pretty low cost to do this; even when hbase sampling is inactive, the
stats for an FS can be collected.
# The stats showed that rackspace UK throttles delete requests; the more files
in a directory I was cleaning up on teardown, the longer it took —only now
exponentially, rather than linearly.
# I didn't hook the code up to the normal hadoop metrics; it's something I'd as
an option now, because it does become something you need to monitor now we are
shifting to longer-lived applications.
# I'd add more on causes of operations, specifically: open(), seek(), duration
of close(), delete() —things where the fact that object stores are generally
O(files*data) means they don't work as expected ... finding that mismatch of
expectations matters
More and more object stores are coming in. While s3 is the main one, it'd be
good to have the core stuff store neutral. The classes from hadoop-openstack
can be moved if that helps; the per-verb stuff is useful at the deep levels,
while htrace monitoring can track cost of specific actions.
> Add HTrace to the s3a connector
> -------------------------------
>
> Key: HADOOP-12949
> URL: https://issues.apache.org/jira/browse/HADOOP-12949
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Madhawa Gunasekara
>
> Hi All,
> s3, GCS, WASB, and other cloud blob stores are becoming increasingly
> important in Hadoop. But we don't have distributed tracing for these yet. It
> would be interesting to add distributed tracing here. It would enable
> collecting really interesting data like probability distributions of PUT and
> GET requests to s3 and their impact on MR jobs, etc.
> I would like to implement this feature, Please shed some light on this
> Thanks,
> Madhawa
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)