[
https://issues.apache.org/jira/browse/HADOOP-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294820#comment-16294820
]
Steve Loughran commented on HADOOP-15124:
-----------------------------------------
You've got some good improvements there. Interesting that even though Statistic
is thread local, you are still seeing improvements, implying its how
ThreadLocal does its deref which is a bottleneck
But: threadlocal does let you get at the specific stats for a thread, which
means the isolated IO for that specific operation, which you can tie back into
the query in progress. That's actually something I want reinstated (somehow)
for the new API, as I'm trying to feed stats back from spark queries in a
shared cluster to the actual operation.
And FileSystem.Statistics is public in a class tagged Public/Stable: we cannot
cut it or things will break and people will point to our compatibility
guidelines, make us put it back. Which is the problem HADOOP-13032 has.
What to do? The ongoing stats work is in The new StorageStatistics code, which
is intended to support many more counters than the simple bytes read/written
values; have a look at the DFSOpsCountStatistics and S3AStorageStatistics.
They're using AtomicLong, so there's opportunities to improve stuff there,
especially while we can evolve that, leaving Statistics alone, or, if we can
work out how to give it a view of the newer StorageStats, making things faster
without breaking code.
Moving this under a newly created JIRA for the work HADOOP-15125; Unless
[~liuml07] puts his hand up, I don't think it's going to get active use from
many people. It'd be great if you were to get involved in this, especially as
Hadoop 3 is Java 8+ only, so we are allowed to use LongAdder the like
That said: start off being unambitious in changes, use the new API yourself and
see what can be done to improve it/bridge efficiently to FS.Statistics, as big
changes always meet more resistance.
> Slow FileSystem.Statistics counters implementation
> --------------------------------------------------
>
> Key: HADOOP-15124
> URL: https://issues.apache.org/jira/browse/HADOOP-15124
> Project: Hadoop Common
> Issue Type: Improvement
> Components: common
> Affects Versions: 2.9.0, 2.8.3, 2.7.5, 3.0.0
> Reporter: Igor Dvorzhak
> Labels: common, filesystem, statistics
>
> While profiling 1TB TeraGen job on Hadoop 2.8.2 cluster (Google Dataproc, 2
> workers, GCS connector) I saw that FileSystem.Statistics code paths Wall time
> is 5.58% and CPU time is 26.5% of total execution time.
> After switching FileSystem.Statistics implementation to LongAdder, consumed
> Wall time decreased to 0.006% and CPU time to 0.104% of total execution time.
> Total job runtime decreased from 66 mins to 61 mins.
> These results are not conclusive, because I didn't benchmark multiple times
> to average results, but regardless of performance gains switching to
> LongAdder simplifies code and reduces its complexity.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]