[ 
https://issues.apache.org/jira/browse/HDFS-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794740#comment-13794740
 ] 

Colin Patrick McCabe commented on HDFS-5276:
--------------------------------------------

thanks for the review, Andrew.

bq. FileSystem#reset is now resetting all the statistics, not just bytes 
written and read. This makes sense to me. It looks like the current behavior is 
just an oversight from when the additional statistics were added in 
HADOOP-6859; maybe Suresh Srinivas as the original author can comment.

Yeah, that's what I think too.  Agree that it would be nice to get some 
comments on that.

bq. Resetting by taking the current total, #negate-ing, then adding is not 
immediately obvious to the reader. Is there a reason we don't just have a 
#reset method and reset each StatisticData in turn?

There is a reason.  We can't modify the thread-local data... only the threads 
themselves can do that, or else there's a race condition.  I added a comment 
explaining this better.

bq. I'd also prefer just to not reuse the visitor / visitAll [in reset] too, 
since it's kind of weird semantically: #total() mutates the total 
StatisticsData member, and it depends on #total() being called exactly once at 
the end.

I actually thought it was a nice example of code reuse.  Iterating over all the 
{{StatisticsData}} objects is complex and I would not want to duplicate that 
code.  I added a comment to {{visitAll}} that describes the fact that 
{{aggregate}} (previously called {{total}}) is called exactly once at the end-- 
hopefully that makes it more obvious.

> FileSystem.Statistics got performance issue on multi-thread read/write.
> -----------------------------------------------------------------------
>
>                 Key: HDFS-5276
>                 URL: https://issues.apache.org/jira/browse/HDFS-5276
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.4-alpha
>            Reporter: Chengxiang Li
>            Assignee: Colin Patrick McCabe
>         Attachments: DisableFSReadWriteBytesStat.patch, HDFS-5276.001.patch, 
> HDFS-5276.002.patch, HDFSStatisticTest.java, hdfs-test.PNG, jstack-trace.PNG, 
> TestFileSystemStatistics.java, ThreadLocalStat.patch
>
>
> FileSystem.Statistics is a singleton variable for each FS scheme, each 
> read/write on HDFS would lead to a AutomicLong.getAndAdd(). AutomicLong does 
> not perform well in multi-threads(let's say more than 30 threads). so it may 
> cause  serious performance issue. during our spark test profile, 32 threads 
> read data from HDFS, about 70% cpu time is spent on 
> FileSystem.Statistics.incrementBytesRead().



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to