[ https://issues.apache.org/jira/browse/HADOOP-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhruba borthakur updated HADOOP-1269: ------------------------------------- Attachment: chooseTargetLock2.patch Incorporated Konstantin's review comments. 1. NetworkTopology.isOnSameRack looks at node.getParent(). These are protected by the clusterMap lock. So, I kept it as it way, did not make any change. 2. NetworkTopology.getDistance(): removed redundant declaration i. 3. Host2NodesMap.add locking issue. This was a good catch. I made this change. Fixed indentation. 4. Moved the LOG statement in getAdditionalBlock as suggested. I also ran randomWriter on a 10 node cluster. The test ran to completion. The total elapsed time of the test without this patch was 2 hr 40 min and with this patch was 2 hours 31 minutes. Not a single task error was encountered. > DFS Scalability: namenode throughput impacted becuase of global FSNamesystem > lock > --------------------------------------------------------------------------------- > > Key: HADOOP-1269 > URL: https://issues.apache.org/jira/browse/HADOOP-1269 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: chooseTargetLock.patch, chooseTargetLock2.patch, > serverThreads1.html, serverThreads40.html > > > I have been running a 2000 node cluster and measuring namenode performance. > There are quite a few "Calls dropped" messages in the namenode log. The > namenode machine has 4 CPUs and each CPU is about 30% busy. Profiling the > namenode shows that the methods the consume CPU the most are addStoredBlock() > and getAdditionalBlock(). The first method in invoked when a datanode > confirms the presence of a newly created block. The second method in invoked > when a DFSClient request a new block for a file. > I am attaching two files that were generated by the profiler. > serverThreads40.html captures the scenario when the namenode had 40 server > handler threads. serverThreads1.html is with 1 server handler thread (with a > max_queue_size of 4000). > In the case when there are 40 handler threads, the total elapsed time taken > by FSNamesystem.getAdditionalBlock() is 1957 seconds whereas the methods > that that it invokes (chooseTarget) takes only about 97 seconds. > FSNamesystem.getAdditionalBlock is blocked on the global FSNamesystem lock > for all those 1860 seconds. > My proposal is to implement a finer grain locking model in the namenode. The > FSNamesystem has a few important data structures, e.g. blocksMap, > datanodeMap, leases, neededReplication, pendingCreates, heartbeats, etc. Many > of these data structures already have their own lock. My proposal is to have > a lock for each one of these data structures. The individual lock will > protect the integrity of the contents of the data structure that it protects. > The global FSNamesystem lock is still needed to maintain consistency across > different data structures. > If we implement the above proposal, both addStoredBlock() and > getAdditionalBlock() does not need to hold the global FSNamesystem lock. > startFile() and closeFile() still needs to acquire the global FSNamesystem > lock because it needs to ensure consistency across multiple data structures. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.