[
https://issues.apache.org/jira/browse/HADOOP-16830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161970#comment-17161970
]
Luca Canali edited comment on HADOOP-16830 at 7/21/20, 12:21 PM:
-----------------------------------------------------------------
[[email protected]] I have compiled and also briefly the PR with Spark reading
from S3A, and the first exploration I did looks quite good to me. As mentioned
previously, one of my goals with this is to add time-based metrics to IO
Statistics, as in this [proof-of-concept implementation of some read time
metrics for
S3A|https://github.com/LucaCanali/hadoop/commit/4ed077061e5826711307941dd397250e2afc47a2].
I was wondering if it could make sense to include in this PR already a list of
Statistics names for time-based IO instrumentation, so to guide the naming
convention and future implementation efforts?
was (Author: lucacanali):
[[email protected]] I have compiled and also briefly the PR with Spark reading
from S3A, and the first exploration I did looks quite good to me. As mentioned
previously, one of my goals with this is to add time-based metrics to IO
Statistics, as in this [proof-of-concept implementation of some read time
metrics for
S3A|https://github.com/LucaCanali/hadoop/commit/4ed077061e5826711307941dd397250e2afc47a2].
I was wondering if it could make sense to include in this patch already a list
of Statistics names for time-based IO instrumentation, so to guide the naming
convention and future implementation efforts?
> Add public IOStatistics API; S3A to support
> -------------------------------------------
>
> Key: HADOOP-16830
> URL: https://issues.apache.org/jira/browse/HADOOP-16830
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs, fs/s3
> Affects Versions: 3.3.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
>
> Applications like to collect the statistics which specific operations take,
> by collecting exactly those operations done during the execution of FS API
> calls by their individual worker threads, and returning these to their job
> driver
> * S3A has a statistics API for some streams, but it's a non-standard one;
> Impala &c can't use it
> * FileSystem storage statistics are public, but as they aren't cross-thread,
> they don't aggregate properly
> Proposed
> # A new IOStatistics interface to serve up statistics
> # S3A to implement
> # other stores to follow
> # Pass-through from the usual wrapper classes (FS data input/output streams)
> It's hard to think about how best to offer an API for operation context
> stats, and how to actually implement.
> ThreadLocal isn't enough because the helper threads need to update on the
> thread local value of the instigator
> My Initial PoC doesn't address that issue, but it shows what I'm thinking of
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]