[jira] [Commented] (HADOOP-15124) Slow FileSystem.Statistics counters implementation

Steve Loughran (JIRA) Mon, 18 Dec 2017 03:14:45 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294820#comment-16294820
 ]


Steve Loughran commented on HADOOP-15124:
-----------------------------------------

You've got some good improvements there. Interesting that even though Statistic 
is thread local, you are still seeing improvements, implying its how 
ThreadLocal does its deref which is a bottleneck

But: threadlocal does let you get at the specific stats for a thread, which 
means the isolated IO for that specific operation, which you can tie back into 
the query in progress. That's actually something I want reinstated (somehow) 
for the new API, as I'm trying to feed stats back from spark queries in a 
shared cluster to the actual operation.

And FileSystem.Statistics is public in a class tagged Public/Stable: we cannot 
cut it or things will break and people will point to our compatibility 
guidelines, make us put it back. Which is the problem HADOOP-13032 has.

What to do? The ongoing stats work is in The new StorageStatistics code, which 
is intended to support many more counters than the simple bytes read/written 
values; have a look at the DFSOpsCountStatistics and S3AStorageStatistics. 
They're using AtomicLong, so there's opportunities to improve stuff there, 
especially while we can evolve that, leaving Statistics alone, or, if we can 
work out how to give it a view of the newer StorageStats, making things faster 
without breaking code.

Moving this under a newly created JIRA for the work HADOOP-15125; Unless 
[~liuml07] puts his hand up, I don't think it's going to get active use from 
many people. It'd be great if you were to get involved in this, especially as 
Hadoop 3 is Java 8+ only, so we are allowed to use LongAdder the like

That said: start off being unambitious in changes, use the new API yourself and 
see what can be done to improve it/bridge efficiently to FS.Statistics, as big 
changes always meet more resistance. 




> Slow FileSystem.Statistics counters implementation
> --------------------------------------------------
>
>                 Key: HADOOP-15124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15124
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: common
>    Affects Versions: 2.9.0, 2.8.3, 2.7.5, 3.0.0
>            Reporter: Igor Dvorzhak
>              Labels: common, filesystem, statistics
>
> While profiling 1TB TeraGen job on Hadoop 2.8.2 cluster (Google Dataproc, 2 
> workers, GCS connector) I saw that FileSystem.Statistics code paths Wall time 
> is 5.58% and CPU time is 26.5% of total execution time.
> After switching FileSystem.Statistics implementation to LongAdder, consumed 
> Wall time decreased to 0.006% and CPU time to 0.104% of total execution time.
> Total job runtime decreased from 66 mins to 61 mins.
> These results are not conclusive, because I didn't benchmark multiple times 
> to average results, but regardless of performance gains switching to 
> LongAdder simplifies code and reduces its complexity.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15124) Slow FileSystem.Statistics counters implementation

Reply via email to