[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167851#comment-14167851 ]
Andrew Wang commented on HDFS-6982: ----------------------------------- Hi Maysam, thanks for sticking with this, and again apologies for the delay with reviewing this. I especially appreciate the effort to javadoc everything, it helped a lot with reviewing this. One thing that'd be helpful are some basic setup instructions so I can try it out. Since I don't see any modifications to any existing files, I'm also wondering how this is exposed to JMX or on the webUI. Anyway, I did a light pass for mostly code-style and nitty stuff, hopefully I can do a more high-level overview once the above is answered. Nits: * Need to put {{<p/>}} tags in javadoc to get line breaks. Can also use {{@link}} to refer to methods. * Should use SLF4J logging in new classes, i.e. {{LoggerFactory.getLogger()}}, and use the "{}" template to avoid string construction overhead for debug prints * These new classes should probably all be annotated @InterfaceAudience.Private * Bucket could be private rather than package-private * Code style: if/for blocks should always have braces * A few lines longer than 80 chars * I generally prefer doing Precondition checks to manually throwing IllegalStateException, since it's more concise. RollingWindowManager: * There's only a {{getDefaultRollingWindow}} class, no other ways of constructing a RollingWindow. With this in mind, could we rename it to just {{getRollingWindow}}? Same for params like {{defaultWindowLen}} and {{defaultBPW}}, which make me think of the configuration defaults. * Would be good to do Precondition validation on the config params, e.g. greater than zero, numBuckets less than windowLen, evenly divisible, etc. TopConfiguration: * Could we put most of this into hdfs-site.xml, hdfs-default.xml, and DFSConfigKeys instead? It'd also be good to have some light docs in hdfs-default.xml about how to configure this. * If so, propose we use a prefix like "dfs.namenode.top." * How do we configure multiple reporting periods? That TopMetrics constructor isn't called right now. It'd also be good to unify getting and then validating all the configs in a single place, like this class. See NNConf for an example of what I'm thinking of. * Typo in config key: BUKCETS * WEB_PORT and DEFAULT_WEB_PORT seem to be unused, TopConfiguration(Configuration) constructor too * getCmdTotal and getTopMetricsRecordPrefix static getters are only used in TopMetrics, that might be a better home. * Rather than MIN_2_MS, could we have a long array with the default periods, i.e. DEFAULT_REPORTING_PERIODS? We already have a Configuration#getInts methods, could add a Configuration#getLongs method too. TopMetrics: * Typo: "TopMetic singlton" * If you use SLF4J, you can oftentimes skip wrapping things in LOG.isDebugEnabled if you also do the string templating * getMetrics does the isDebugEnabled check, but inside does LOG.info * report, we construct the permStr, but don't actually use it. * report, I don't think we need the catch for {{Throwable t}}, no checked exceptions are being thrown? * getReportingPeriods is unused TopUtil * This stuff isn't shared much, seems like we could just move things to where they're used TopMetricsCollector * Is this used? TestRWM: * Let's do the init in a @BeforeClass block * Rather than grabbing defaultWindowLen and defaultBPW directly, we should set them in the config, pass that in when making the RWManager, then we can verify that in the test that they've taken effect. Can then make these RWM variables private. More tests: * Could also test the JMX interface, e.g. in TestNameNodeMetrics > nntop: top-like tool for name node users > ----------------------------------------- > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Maysam Yabandeh > Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)