[ https://issues.apache.org/jira/browse/HDFS-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758718#comment-16758718 ]
Erik Krogen commented on HDFS-14084: ------------------------------------ Hey folks, I was testing out this patch and noticed an issue that I consider pretty serious. When this is used in a standard DFS client, the {{DefaultMetricsSystem}} singleton has never been initialized, so there is no proper prefix to use for configurations, and numerous bits of testing code is triggered. For example: {code:title=MetricsSystemImpl#register()} final String finalName = // be friendly to non-metrics tests DefaultMetricsSystem.sourceName(name2, !monitoring); {code} With your patch, this is triggered when {{monitoring}} is false, which is really only intended for testing AFAICT. This is the first instance I'm aware of that is leveraging metrics2 for client-side metrics. I think it means that we need to add a {{DefaultMetricsSystem.init("client")}} in the instantiation of {{Client}}. It will need a corresponding {{shutdown()}}, probably on {{Client#close()}}. Unfortunately both of these methods are expected to be only called once, so we probably need to add some new mechanisms for a "conditional initialization" that only initializes the system if this is the first call. > Need for more stats in DFSClient > -------------------------------- > > Key: HDFS-14084 > URL: https://issues.apache.org/jira/browse/HDFS-14084 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 3.0.0 > Reporter: Pranay Singh > Assignee: Pranay Singh > Priority: Minor > Attachments: HDFS-14084.001.patch, HDFS-14084.002.patch, > HDFS-14084.003.patch, HDFS-14084.004.patch, HDFS-14084.005.patch, > HDFS-14084.006.patch, HDFS-14084.007.patch, HDFS-14084.008.patch, > HDFS-14084.009.patch, HDFS-14084.010.patch, HDFS-14084.011.patch, > HDFS-14084.012.patch, HDFS-14084.013.patch, HDFS-14084.014.patch, > HDFS-14084.015.patch, HDFS-14084.016.patch, HDFS-14084.017.patch, > HDFS-14084.018.patch > > > The usage of HDFS has changed from being used as a map-reduce filesystem, now > it's becoming more of like a general purpose filesystem. In most of the cases > there are issues with the Namenode so we have metrics to know the workload or > stress on Namenode. > However, there is a need to have more statistics collected for different > operations/RPCs in DFSClient to know which RPC operations are taking longer > time or to know what is the frequency of the operation.These statistics can > be exposed to the users of DFS Client and they can periodically log or do > some sort of flow control if the response is slow. This will also help to > isolate HDFS issue in a mixed environment where on a node say we have Spark, > HBase and Impala running together. We can check the throughput of different > operation across client and isolate the problem caused because of noisy > neighbor or network congestion or shared JVM. > We have dealt with several problems from the field for which there is no > conclusive evidence as to what caused the problem. If we had metrics or stats > in DFSClient we would be better equipped to solve such complex problems. > List of jiras for reference: > ------------------------- > HADOOP-15538 HADOOP-15530 ( client side deadlock) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org