[ 
https://issues.apache.org/jira/browse/HDFS-16517?focusedWorklogId=745953&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-745953
 ]

ASF GitHub Bot logged work on HDFS-16517:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Mar/22 17:21
            Start Date: 22/Mar/22 17:21
    Worklog Time Spent: 10m 
      Work Description: omalley opened a new pull request #4091:
URL: https://github.com/apache/hadoop/pull/4091


   ### Description of PR
   
   The distance metric used for machines in 2.10 that aren't in the 
NetworkTopology, because they aren't running DataNodes, is wrong. It means that 
off-rack and on-rack, but off-node, are both given a weight of 2. In normal 
Hadoop clusters, this isn't a big problem because they don't have clients that 
are on-rack but without DataNodes. For clusters that are striped (federated 
HDFS going across racks) or separate compute and storage that share racks are 
both really bad with this bug.
   
   ### How was this patch tested?
   
   Unit test added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 745953)
    Remaining Estimate: 0h
            Time Spent: 10m

> In 2.10 the distance metric is wrong for non-DN machines
> --------------------------------------------------------
>
>                 Key: HDFS-16517
>                 URL: https://issues.apache.org/jira/browse/HDFS-16517
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.10.1
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In 2.10, the metric for distance between the client and the data node is 
> wrong for machines that aren't running data nodes (ie. 
> getWeightUsingNetworkLocation). The code works correctly in 3.3+. 
> Currently
>  
> ||Client||DataNode||getWeight||getWeightUsingNetworkLocation||
> |/rack1/node1|/rack1/node1|0|0|
> |/rack1/node1|/rack1/node2|2|2|
> |/rack1/node1|/rack2/node2|4|2|
> |/pod1/rack1/node1|/pod1/rack1/node2|2|2|
> |/pod1/rack1/node1|/pod1/rack2/node2|4|2|
> |/pod1/rack1/node1|/pod2/rack2/node2|6|4|
>  
> This bug will destroy data locality on clusters where the clients share racks 
> with DataNodes, but are running on machines that aren't running DataNodes, 
> such as striping federated HDFS clusters across racks.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to