[ 
https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212860#comment-15212860
 ] 

Brahma Reddy Battula commented on HDFS-9579:
--------------------------------------------

After this in, I can see , there is one extra log for each client operation " 
Adding a new node: "

{noformat}BLR1000006554:/home/Trunk/hadoop/bin # ./hdfs dfs -put hadoop /test2
16/03/26 15:07:22 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
16/03/26 15:07:23 INFO net.NetworkTopology: Adding a new node: 
/default-rack/BLR1000006554 {noformat}

If the ScriptBasedMapping is used, then topology script should be configured 
and placed in all machines wherever HDFS clients created to get the correct 
values.It will still work, but will not have correct statistics.Since everytime 
client is treated will be resolved as DEFAULT_RACK

> Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-9579
>                 URL: https://issues.apache.org/jira/browse/HDFS-9579
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: HDFS-9579-10.patch, HDFS-9579-2.patch, 
> HDFS-9579-3.patch, HDFS-9579-4.patch, HDFS-9579-5.patch, HDFS-9579-6.patch, 
> HDFS-9579-7.patch, HDFS-9579-8.patch, HDFS-9579-9.patch, 
> HDFS-9579-branch-2.patch, HDFS-9579.patch, MR job counters.png
>
>
> For cross DC distcp or other applications, it becomes useful to have insight 
> as to the traffic volume for each network distance to distinguish cross-DC 
> traffic, local-DC-remote-rack, etc.
> FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To 
> provide additional metrics for each network distance, we can add additional 
> metrics to FileSystem level and have {{DFSInputStream}} update the value 
> based on the network distance between client and the datanode.
> {{DFSClient}} will resolve client machine's network location as part of its 
> initialization. It doesn't need to resolve datanode's network location for 
> each read as {{DatanodeInfo}} already has the info.
> There are existing HDFS specific metrics such as {{ReadStatistics}} and 
> {{DFSHedgedReadMetrics}}. But these metrics are only accessible via 
> {{DFSClient}} or {{DFSInputStream}}. Not something that application framework 
> such as MR and Tez can get to. That is the benefit of storing these new 
> metrics in FileSystem.Statistics.
> This jira only includes metrics generation by HDFS. The consumption of these 
> metrics at MR and Tez will be tracked by separated jiras.
> We can add similar metrics for HDFS write scenario later if it is necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to