[ 
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032636#comment-13032636
 ] 

Todd Lipcon commented on HDFS-1332:
-----------------------------------

Hey Nicholas. I thought about the performance impact as well, but I came to the 
conlusion that the node-selection code is not a hot code path. In my 
experience, the NN spends much much more time on read operations than on block 
allocation. For example, on one production NN whose metrics I have access to, 
it has performed 3.6M addBlock operations vs 105M FileInfoOps, 30M GetListing 
ops, 27M GetBlockLocations ops.

Additionally, the new code will only get run for nodes which are 
decommissioning, out of space, or highly loaded. Thus it's not likely that it 
will add any appreciable overhead to most chooseTarget operations.

Looking at the existing code, it's hardly optimized at all. For example, each 
invocation of chooseRandom() invokes countNumOfAvailableNodes which takes and 
releases locks, computes String substrings, etc.



> When unable to place replicas, BlockPlacementPolicy should log reasons nodes 
> were excluded
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1332
>                 URL: https://issues.apache.org/jira/browse/HDFS-1332
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: Todd Lipcon
>            Assignee: Ted Yu
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.23.0
>
>         Attachments: HDFS-1332.patch
>
>
> Whenever the block placement policy determines that a node is not a "good 
> target" it could add the reason for exclusion to a list, and then when we log 
> "Not able to place enough replicas" we could say why each node was refused. 
> This would help new users who are having issues on pseudo-distributed (eg 
> because their data dir is on /tmp and /tmp is full). Right now it's very 
> difficult to figure out the issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to