[jira] [Work logged] (HDFS-16439) Makes calculating maxNodesPerRack simpler

ASF GitHub Bot (Jira) Thu, 27 Jan 2022 06:33:19 -0800


     [ 
https://issues.apache.org/jira/browse/HDFS-16439?focusedWorklogId=716447&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-716447
 ]


ASF GitHub Bot logged work on HDFS-16439:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Jan/22 14:32
            Start Date: 27/Jan/22 14:32
    Worklog Time Spent: 10m 
      Work Description: jianghuazhu commented on pull request #3937:
URL: https://github.com/apache/hadoop/pull/3937#issuecomment-1023271785


   Thanks @ayushtkn  for the comment and review.
   This happens when more replicas are required, say the number of all 
DataNodes is 5 and the number of replicas required for the file is 7. At this 
time, the code execution here is usually triggered, because the number of 
copies of the file cannot exceed the number of nodes in the cluster.
   Here, there will be 2 benefits:
   1. When resetting the new number of file copies, it will be more concise and 
easier to understand, (clusterSize - numOfChosen). This is the main purpose.
   2. It can reduce some calculation steps, (clusterSize - numOfChosen) only 
needs to perform 1 step.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 716447)
    Time Spent: 40m  (was: 0.5h)

> Makes calculating maxNodesPerRack simpler
> -----------------------------------------
>
>                 Key: HDFS-16439
>                 URL: https://issues.apache.org/jira/browse/HDFS-16439
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.4.0
>            Reporter: JiangHua Zhu
>            Assignee: JiangHua Zhu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> When creating a new file, it is usually necessary to communicate with the 
> namenode first to obtain the location of some DataNodes as the target 
> location of Blockd. At this time, when 
> BlockPlacementPolicyDefault#getMaxNodesPerRack() is executed, if the number 
> of replicas is very large, once it exceeds the number of all nodes in the 
> cluster. The following piece of code will be executed:
>        int clusterSize = clusterMap.getNumOfLeaves();
>        int totalNumOfReplicas = numOfChosen + numOfReplicas;
> if (totalNumOfReplicas > clusterSize) {
>        numOfReplicas -= (totalNumOfReplicas-clusterSize);
>        totalNumOfReplicas = clusterSize;
>      }
> Here, the calculation for numOfReplicas gets a little more complicated. It 
> can be simplified like:
> numOfReplicas = clusterSize - numOfChosen
> It would be more helpful to understand it this way, while also freeing up a 
> little cpu (though not a lot).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-16439) Makes calculating maxNodesPerRack simpler

Reply via email to