[ https://issues.apache.org/jira/browse/HDFS-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793430#comment-13793430 ]
Haohui Mai commented on HDFS-5276: ---------------------------------- [~decster], that's quite impressive. It might be worthwhile to generalize the approach. But before that, I'm wondering what would the performance look like if you simply mark the variable as volatile? The performance might not be as good the thread-local approach since the JVM specification enforces memory fences at access of volatile variables, but it is the simplest approach and it can be easily generalizable through the code base. I appreciate if you can test the volatile approach on the same set up and report the result. By then we'll have a much better understanding on which approach we should use to write the statistics code. Thanks! > FileSystem.Statistics got performance issue on multi-thread read/write. > ----------------------------------------------------------------------- > > Key: HDFS-5276 > URL: https://issues.apache.org/jira/browse/HDFS-5276 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.0.4-alpha > Reporter: Chengxiang Li > Assignee: Colin Patrick McCabe > Attachments: DisableFSReadWriteBytesStat.patch, HDFS-5276.001.patch, > HDFS-5276.002.patch, HDFSStatisticTest.java, hdfs-test.PNG, jstack-trace.PNG, > TestFileSystemStatistics.java, ThreadLocalStat.patch > > > FileSystem.Statistics is a singleton variable for each FS scheme, each > read/write on HDFS would lead to a AutomicLong.getAndAdd(). AutomicLong does > not perform well in multi-threads(let's say more than 30 threads). so it may > cause serious performance issue. during our spark test profile, 32 threads > read data from HDFS, about 70% cpu time is spent on > FileSystem.Statistics.incrementBytesRead(). -- This message was sent by Atlassian JIRA (v6.1#6144)