[ https://issues.apache.org/jira/browse/HDFS-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794380#comment-13794380 ]
Andrew Wang commented on HDFS-5276: ----------------------------------- I really like how this has shaped up. The microbenchmark numbers here very impressive (and probably even better on more cores); thanks for doing this Binglin. It'd still be good to see this with the actual Spark workload if possible. Colin, overall patch looks good, only some minor comments: * It'd be nice to include "aggregation" somehow in the name of {{StatisticsVIsitor}} and/or {{#visitAll}}, e.g. {{StatisticsAggregationVisitor}} and/or {{#aggregate}}, since {{total}} is part of the interface. * {{FileSystem#reset}} is now resetting all the statistics, not just bytes written and read. This makes sense to me. It looks like the current behavior is just an oversight from when the additional statistics were added in HADOOP-6859; maybe [~sureshms] as the original author can comment. * Resetting by taking the current total, {{#negate}}-ing, then adding is not immediately obvious to the reader. Is there a reason we don't just have a {{#reset}} method and reset each {{StatisticData}} in turn? * I'd also prefer just to not reuse the visitor / {{visitAll}} here too, since it's kind of weird semantically: {{#total()}} mutates the {{total}} {{StatisticsData}} member, and it depends on {{#total()}} being called exactly once at the end. > FileSystem.Statistics got performance issue on multi-thread read/write. > ----------------------------------------------------------------------- > > Key: HDFS-5276 > URL: https://issues.apache.org/jira/browse/HDFS-5276 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.0.4-alpha > Reporter: Chengxiang Li > Assignee: Colin Patrick McCabe > Attachments: DisableFSReadWriteBytesStat.patch, HDFS-5276.001.patch, > HDFS-5276.002.patch, HDFSStatisticTest.java, hdfs-test.PNG, jstack-trace.PNG, > TestFileSystemStatistics.java, ThreadLocalStat.patch > > > FileSystem.Statistics is a singleton variable for each FS scheme, each > read/write on HDFS would lead to a AutomicLong.getAndAdd(). AutomicLong does > not perform well in multi-threads(let's say more than 30 threads). so it may > cause serious performance issue. during our spark test profile, 32 threads > read data from HDFS, about 70% cpu time is spent on > FileSystem.Statistics.incrementBytesRead(). -- This message was sent by Atlassian JIRA (v6.1#6144)