[jira] Commented: (HADOOP-1269) DFS Scalability: namenode throughput impacted becuase of global FSNamesystem lock

Raghu Angadi (JIRA) Wed, 18 Apr 2007 19:04:37 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489914
 ]


Raghu Angadi commented on HADOOP-1269:
--------------------------------------


Could you briefly describe the load you tested with. This does not seem to have 
read traffic. We would expect at least 50% read traffic ( safe to assume 
everything written is read at least once ).

I could not see actual number of calls, but from the numbers in single thread 
and 40 threads case, getAdditionalBloc() seems consume nearly order of 
magniture more cpu time than addStoredBlock().

my thoughts:

Of course, I don't have complete details for actual locking changes but I think 
using cocurrent data structures will be trouble. (Caution truism : -)  ) We 
should lock around logical set of data structures explicitly to gaurantee some 
logical consistency (e.g. if a block b exists in datanodeDescriptors map, then 
blockMap should contain it and 'containingNodes' for this block in blockMap 
should contain the descriptor).

Read/Write locks should also be considered since they can provide good parallel 
access..  especially with considerable read traffic.




> DFS Scalability: namenode throughput impacted becuase of global FSNamesystem 
> lock
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-1269
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1269
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: serverThreads1.html, serverThreads40.html
>
>
> I have been running a 2000 node cluster and measuring namenode performance. 
> There are quite a few "Calls dropped" messages in the namenode log. The 
> namenode machine has 4 CPUs and each CPU is about 30% busy. Profiling the 
> namenode shows that the methods the consume CPU the most are addStoredBlock() 
> and getAdditionalBlock(). The first method in invoked when a datanode 
> confirms the presence of a newly created block. The second method in invoked 
> when a DFSClient request a new block for a file.
> I am attaching two files that were generated by the profiler. 
> serverThreads40.html captures the scenario when the namenode had 40 server 
> handler threads. serverThreads1.html is with 1 server handler thread (with a 
> max_queue_size of 4000).
> In the case when there are 40 handler threads, the total elapsed time taken 
> by  FSNamesystem.getAdditionalBlock() is 1957 seconds whereas the methods 
> that that it invokes (chooseTarget) takes only about 97 seconds. 
> FSNamesystem.getAdditionalBlock is blocked on the global FSNamesystem lock 
> for all those 1860 seconds.
> My proposal is to implement a finer grain locking model in the namenode. The 
> FSNamesystem has a few important data structures, e.g. blocksMap, 
> datanodeMap, leases, neededReplication, pendingCreates, heartbeats, etc. Many 
> of these data structures already have their own lock. My proposal is to have 
> a lock for each one of these data structures. The individual lock will 
> protect the integrity of the contents of the data structure that it protects. 
> The global FSNamesystem lock is still needed to maintain consistency across 
> different data structures.
> If we implement the above proposal, both addStoredBlock() and 
> getAdditionalBlock() does not need to hold the global FSNamesystem lock. 
> startFile() and closeFile() still needs to acquire the global FSNamesystem 
> lock because it needs to ensure consistency across multiple data structures.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1269) DFS Scalability: namenode throughput impacted becuase of global FSNamesystem lock

Reply via email to