[
https://issues.apache.org/jira/browse/HDFS-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782307#comment-13782307
]
Colin Patrick McCabe commented on HDFS-5276:
--------------------------------------------
bq. How do you know all the threads that are maintaining thread local variables?
The first time a thread tries to access a thread-local-variable, it will get
null. At that point, the thread creates the thread-local counters object,
takes a mutex, and adds a reference to it to the list inside FileSystem.
Periodically, we go over the list of thread-locals and sum them up into a
total. (We also do that summation when reading statistics). At that point, we
remove any thread-locals which belong to threads which no longer exist.
Check out the "flat combining" paper, which is a more abstract description of
this idea: http://www.cs.bgu.ac.il/~hendlerd/papers/flat-combining.pdf
> FileSystem.Statistics got performance issue on multi-thread read/write.
> -----------------------------------------------------------------------
>
> Key: HDFS-5276
> URL: https://issues.apache.org/jira/browse/HDFS-5276
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.0.4-alpha
> Reporter: Chengxiang Li
> Attachments: DisableFSReadWriteBytesStat.patch,
> HDFSStatisticTest.java, hdfs-test.PNG, jstack-trace.PNG
>
>
> FileSystem.Statistics is a singleton variable for each FS scheme, each
> read/write on HDFS would lead to a AutomicLong.getAndAdd(). AutomicLong does
> not perform well in multi-threads(let's say more than 30 threads). so it may
> cause serious performance issue. during our spark test profile, 32 threads
> read data from HDFS, about 70% cpu time is spent on
> FileSystem.Statistics.incrementBytesRead().
--
This message was sent by Atlassian JIRA
(v6.1#6144)