[jira] Commented: (HADOOP-1269) DFS Scalability: namenode throughput impacted becuase of global FSNamesystem lock

dhruba borthakur (JIRA) Wed, 18 Apr 2007 16:11:36 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489905
 ]


dhruba borthakur commented on HADOOP-1269:
------------------------------------------

Different methods may synchronize on different subsets of the namenode's data, 
but it is not necessary. The global FSNamesystem lock will still be there and 
it will provide consistency across data structures. The other locks are per 
data structure. I want to keep the lock hierarchy very very simple with only 
two levels. My aim is to design a scheme that has very low risk of deadlocks 
and optimize only those methods that are really necessary. The other portions 
of the file system will still be protected by the global FSNamesystem lock. The 
lock hierarchy has only two levels:

1. First acquire the global FSNasmesystem lock
2, Then acquire any of the other data structure lock. These locks are at the 
same lock level, so one cannot acquire a lock while holding another. For 
example, one cannot acquire the lock on neededReplications while holding the 
lock on blocksMap.

For example, there will be a lock for pendingCreates. Insertion and deletion of 
a file and its blocks into pendingCreates will be protected by the 
pendingCreates lock. Now, getAdditionalBlock() does not need to keep the global 
FSNamesystem lock. Similarly, addStoredBlock adds a block to the blocksMap. 
This will be protected by the blocksMap lock. Thus addStoredBlock does not need 
to hold the FSNamesystem lock. This will make addStoredBlock and 
getAdditionalBlock() execute in parallel.

Both pendingCreates and blocksMap are backed by a HashMap. I plan on changing 
them to ConcurrentHashMap and measure performance. This will make multiple 
instances of addStoredBlock() run in parallel.

We have done a first pass of performance improvements to both addStoredMap and 
getAdditionalBlock. The profiler clearly shows that the time to execute the 
code of these methods was very small, however the lock wait times were 
extremely high.

Please let me know if this sounds reasonable. 







> DFS Scalability: namenode throughput impacted becuase of global FSNamesystem 
> lock
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-1269
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1269
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: serverThreads1.html, serverThreads40.html
>
>
> I have been running a 2000 node cluster and measuring namenode performance. 
> There are quite a few "Calls dropped" messages in the namenode log. The 
> namenode machine has 4 CPUs and each CPU is about 30% busy. Profiling the 
> namenode shows that the methods the consume CPU the most are addStoredBlock() 
> and getAdditionalBlock(). The first method in invoked when a datanode 
> confirms the presence of a newly created block. The second method in invoked 
> when a DFSClient request a new block for a file.
> I am attaching two files that were generated by the profiler. 
> serverThreads40.html captures the scenario when the namenode had 40 server 
> handler threads. serverThreads1.html is with 1 server handler thread (with a 
> max_queue_size of 4000).
> In the case when there are 40 handler threads, the total elapsed time taken 
> by  FSNamesystem.getAdditionalBlock() is 1957 seconds whereas the methods 
> that that it invokes (chooseTarget) takes only about 97 seconds. 
> FSNamesystem.getAdditionalBlock is blocked on the global FSNamesystem lock 
> for all those 1860 seconds.
> My proposal is to implement a finer grain locking model in the namenode. The 
> FSNamesystem has a few important data structures, e.g. blocksMap, 
> datanodeMap, leases, neededReplication, pendingCreates, heartbeats, etc. Many 
> of these data structures already have their own lock. My proposal is to have 
> a lock for each one of these data structures. The individual lock will 
> protect the integrity of the contents of the data structure that it protects. 
> The global FSNamesystem lock is still needed to maintain consistency across 
> different data structures.
> If we implement the above proposal, both addStoredBlock() and 
> getAdditionalBlock() does not need to hold the global FSNamesystem lock. 
> startFile() and closeFile() still needs to acquire the global FSNamesystem 
> lock because it needs to ensure consistency across multiple data structures.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1269) DFS Scalability: namenode throughput impacted becuase of global FSNamesystem lock

Reply via email to