[ 
https://issues.apache.org/jira/browse/HDFS-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794380#comment-13794380
 ] 

Andrew Wang commented on HDFS-5276:
-----------------------------------

I really like how this has shaped up. The microbenchmark numbers here very 
impressive (and probably even better on more cores); thanks for doing this 
Binglin. It'd still be good to see this with the actual Spark workload if 
possible.

Colin, overall patch looks good, only some minor comments:

* It'd be nice to include "aggregation" somehow in the name of 
{{StatisticsVIsitor}} and/or {{#visitAll}}, e.g. 
{{StatisticsAggregationVisitor}} and/or {{#aggregate}}, since {{total}} is part 
of the interface.
* {{FileSystem#reset}} is now resetting all the statistics, not just bytes 
written and read. This makes sense to me. It looks like the current behavior is 
just an oversight from when the additional statistics were added in 
HADOOP-6859; maybe [~sureshms] as the original author can comment.
* Resetting by taking the current total, {{#negate}}-ing, then adding is not 
immediately obvious to the reader. Is there a reason we don't just have a 
{{#reset}} method and reset each {{StatisticData}} in turn?
* I'd also prefer just to not reuse the visitor / {{visitAll}} here too, since 
it's kind of weird semantically: {{#total()}} mutates the {{total}} 
{{StatisticsData}} member, and it depends on {{#total()}} being called exactly 
once at the end.

> FileSystem.Statistics got performance issue on multi-thread read/write.
> -----------------------------------------------------------------------
>
>                 Key: HDFS-5276
>                 URL: https://issues.apache.org/jira/browse/HDFS-5276
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.4-alpha
>            Reporter: Chengxiang Li
>            Assignee: Colin Patrick McCabe
>         Attachments: DisableFSReadWriteBytesStat.patch, HDFS-5276.001.patch, 
> HDFS-5276.002.patch, HDFSStatisticTest.java, hdfs-test.PNG, jstack-trace.PNG, 
> TestFileSystemStatistics.java, ThreadLocalStat.patch
>
>
> FileSystem.Statistics is a singleton variable for each FS scheme, each 
> read/write on HDFS would lead to a AutomicLong.getAndAdd(). AutomicLong does 
> not perform well in multi-threads(let's say more than 30 threads). so it may 
> cause  serious performance issue. during our spark test profile, 32 threads 
> read data from HDFS, about 70% cpu time is spent on 
> FileSystem.Statistics.incrementBytesRead().



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to