[ 
https://issues.apache.org/jira/browse/HDFS-7433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278791#comment-14278791
 ] 

Daryn Sharp commented on HDFS-7433:
-----------------------------------

[~mingma], You are correct that it's not guaranteed to check nodes.per.interval 
when it's at/near the end of the list.  I debated adding the "complexity" to 
wrap around but I chose to keep it simple.  Otherwise you need to track "I 
scanned no nodes, but I saw some in decom with the current scan id, so I'll 
dump the scan number and loop around for a second pass" or "I scanned fewer 
than the nodes.per.interval, but saw some in decom with the current scan id 
prior to the first scanned (lower id), so I'll bump the scan id and loop around 
and re-scan until I reach the first previously scanned to avoid double scans of 
nodes".  I went the simple route, but if you feel strongly it's an issue, I'll 
revisit the impl.

> DatanodeManager#datanodeMap should be a HashMap, not a TreeMap, to optimize 
> lookup performance
> ----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7433
>                 URL: https://issues.apache.org/jira/browse/HDFS-7433
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-7433.patch, HDFS-7433.patch
>
>
> The datanode map is currently a {{TreeMap}}.  For many thousands of 
> datanodes, tree lookups are ~10X more expensive than a {{HashMap}}.  
> Insertions and removals are up to 100X more expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to