Ming Ma created HDFS-9579:
-----------------------------
Summary: Provide bytes-read-by-network-distance metrics at
FileSystem.Statistics level
Key: HDFS-9579
URL: https://issues.apache.org/jira/browse/HDFS-9579
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Ming Ma
For cross DC distcp or other applications, it becomes useful to have insight as
to the traffic volume for each network distance to distinguish cross-DC
traffic, local-DC-remote-rack, etc.
FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To
provide additional metrics for each network distance, we can add additional
metrics to FileSystem level and have {{DFSInputStream}} update the value based
on the network distance between client and the datanode.
{{DFSClient}} will resolve client machine's network location as part of its
initialization. It doesn't need to resolve datanode's network location for each
read as {{DatanodeInfo}} already has the info.
There are existing HDFS specific metrics such as {{ReadStatistics}} and
{{DFSHedgedReadMetrics}}. But these metrics are only accessible via
{{DFSClient}} or {{DFSInputStream}}. Not something that application framework
such as MR and Tez can get to. That is the benefit of storing these new metrics
in FileSystem.Statistics.
This jira only includes metrics generation by HDFS. The consumption of these
metrics at MR and Tez will be tracked by separated jiras.
We can add similar metrics for HDFS write scenario later if it is necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)