[ https://issues.apache.org/jira/browse/HADOOP-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299613#comment-16299613 ]
Igor Dvorzhak commented on HADOOP-15124: ---------------------------------------- Thank you for feedback. I would like to migrate FileSystem.Statistics to new StorageStatistics backend. I will make myself familiar with StorageStatistics code and will see from where better to start. Meanwhile, I have reverted changes to public interface in my PR, it uses both ThreadLocal and LongAdder now. After this, there no improvement to statistics writes performance, even small penalty, but it should be negligible, because ThreadLocal.get much more expensive than LongAdder.add. Still this change allows to get rid of all complicated and synchronized logic for statistics read, which allows to decrease Wall time of Statistics code from 6.49% to 1.06% in 1TB TeraGen job (CPU time increased to 29.4% though, but total runtime still decreased from 66 to 62 minutes). I think that it could have sense to submit this PR before migration to StorageStatistics, because it could be patched to 3.0 and 3.1 branches and provides some performance benefits. Additionally, I'm thinking that while per-thread statistics is useful it not used in regular prod-system job runs (I assume it more valuable for performance tuning and bottlenecks debugging), that's why we can improve statistics writes performance by introducing property that allows to disable per-thread statistics. This will allow to achieve performance characteristics of my initial PR, while preserving all the functionality and backward compatibility. What do you think? Another idea, is to extend Thread class (HadoopThread?) and have Statistics field in it instead of using ThreadLocal - this will allow to achieve much faster per-thread statistics writes without need to disable them with property, but it could be more involving change that harder to maintain. Also, Netty has implemented FastThreadLocal and FastThreadLocalThred classes ( https://netty.io/4.1/api/io/netty/util/concurrent/FastThreadLocal.html ) to address issue of slow ThreadLocal access which we can consider too, but I like an idea of dedicated Statistics field in extended Thread class more, because it will have better performance than even FastThreadLocal implementation. > Slow FileSystem.Statistics counters implementation > -------------------------------------------------- > > Key: HADOOP-15124 > URL: https://issues.apache.org/jira/browse/HADOOP-15124 > Project: Hadoop Common > Issue Type: Sub-task > Components: common > Affects Versions: 2.9.0, 2.8.3, 2.7.5, 3.0.0 > Reporter: Igor Dvorzhak > Assignee: Igor Dvorzhak > Labels: common, filesystem, statistics > > While profiling 1TB TeraGen job on Hadoop 2.8.2 cluster (Google Dataproc, 2 > workers, GCS connector) I saw that FileSystem.Statistics code paths Wall time > is 5.58% and CPU time is 26.5% of total execution time. > After switching FileSystem.Statistics implementation to LongAdder, consumed > Wall time decreased to 0.006% and CPU time to 0.104% of total execution time. > Total job runtime decreased from 66 mins to 61 mins. > These results are not conclusive, because I didn't benchmark multiple times > to average results, but regardless of performance gains switching to > LongAdder simplifies code and reduces its complexity. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org