[ 
https://issues.apache.org/jira/browse/HADOOP-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-1269:
-------------------------------------

    Attachment: chooseTargetLock2.patch

Incorporated Konstantin's review comments.

1. NetworkTopology.isOnSameRack looks at node.getParent(). These are protected 
by the clusterMap lock. So, I kept it as it way, did not make any change.

2. NetworkTopology.getDistance(): removed redundant declaration i.

3. Host2NodesMap.add locking issue. This was a good catch. I made this change. 
Fixed indentation.

4. Moved the LOG statement in getAdditionalBlock as suggested.

I also ran randomWriter on a 10 node cluster. The test ran to completion. The 
total elapsed time of the test without this patch was 2 hr 40 min and with this 
patch was 2 hours 31 minutes. Not a single task error was encountered.


> DFS Scalability: namenode throughput impacted becuase of global FSNamesystem 
> lock
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-1269
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1269
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: chooseTargetLock.patch, chooseTargetLock2.patch, 
> serverThreads1.html, serverThreads40.html
>
>
> I have been running a 2000 node cluster and measuring namenode performance. 
> There are quite a few "Calls dropped" messages in the namenode log. The 
> namenode machine has 4 CPUs and each CPU is about 30% busy. Profiling the 
> namenode shows that the methods the consume CPU the most are addStoredBlock() 
> and getAdditionalBlock(). The first method in invoked when a datanode 
> confirms the presence of a newly created block. The second method in invoked 
> when a DFSClient request a new block for a file.
> I am attaching two files that were generated by the profiler. 
> serverThreads40.html captures the scenario when the namenode had 40 server 
> handler threads. serverThreads1.html is with 1 server handler thread (with a 
> max_queue_size of 4000).
> In the case when there are 40 handler threads, the total elapsed time taken 
> by  FSNamesystem.getAdditionalBlock() is 1957 seconds whereas the methods 
> that that it invokes (chooseTarget) takes only about 97 seconds. 
> FSNamesystem.getAdditionalBlock is blocked on the global FSNamesystem lock 
> for all those 1860 seconds.
> My proposal is to implement a finer grain locking model in the namenode. The 
> FSNamesystem has a few important data structures, e.g. blocksMap, 
> datanodeMap, leases, neededReplication, pendingCreates, heartbeats, etc. Many 
> of these data structures already have their own lock. My proposal is to have 
> a lock for each one of these data structures. The individual lock will 
> protect the integrity of the contents of the data structure that it protects. 
> The global FSNamesystem lock is still needed to maintain consistency across 
> different data structures.
> If we implement the above proposal, both addStoredBlock() and 
> getAdditionalBlock() does not need to hold the global FSNamesystem lock. 
> startFile() and closeFile() still needs to acquire the global FSNamesystem 
> lock because it needs to ensure consistency across multiple data structures.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to