[ https://issues.apache.org/jira/browse/HADOOP-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325929#comment-16325929 ]
Igor Dvorzhak edited comment on HADOOP-15124 at 1/15/18 6:51 AM: ----------------------------------------------------------------- To simplify and improve PR I have: - removed flag stubbing to disable per-thread stats; - re-implemented FileSystemStorageStatistics using EnumMap as a backend (similar to S3 statistics implementations); - re-implemented FileSystem.Statistics using FileSystemStorageStatistics as a backend (preserving all functionality and backward compatability). It decreased performance a little (compared to non-EnumMap based implementation), but it's still faster than current implementation. Please, take a look at PR and advise how to move this change forward. was (Author: medb): To simplify the change I have: - removed flag stubbing to disable per-thread stats; - re-implemented FileSystemStorageStatistics using EnumMap as a backend (similar to S3 statistics implementations); - re-implemented FileSystem.Statistics using FileSystemStorageStatistics as a backend (preserving all functionality and backward compatability). It decreased performance a little (compared to non-EnumMap based implementation), but it's still faster than current implementation. Please, take a look and advise how to move this change forward. > Slow FileSystem.Statistics counters implementation > -------------------------------------------------- > > Key: HADOOP-15124 > URL: https://issues.apache.org/jira/browse/HADOOP-15124 > Project: Hadoop Common > Issue Type: Sub-task > Components: common > Affects Versions: 2.9.0, 2.8.3, 2.7.5, 3.0.0 > Reporter: Igor Dvorzhak > Assignee: Igor Dvorzhak > Priority: Major > Labels: common, filesystem, statistics > > While profiling 1TB TeraGen job on Hadoop 2.8.2 cluster (Google Dataproc, 2 > workers, GCS connector) I saw that FileSystem.Statistics code paths Wall time > is 5.58% and CPU time is 26.5% of total execution time. > After switching FileSystem.Statistics implementation to LongAdder, consumed > Wall time decreased to 0.006% and CPU time to 0.104% of total execution time. > Total job runtime decreased from 66 mins to 61 mins. > These results are not conclusive, because I didn't benchmark multiple times > to average results, but regardless of performance gains switching to > LongAdder simplifies code and reduces its complexity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org