[ 
https://issues.apache.org/jira/browse/HADOOP-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-1306:
-------------------------------------

    Attachment: fineGrainLocks2.patch

Here is a first version of changing the locking so that getAdditionalBlock and 
addStoredBlock occur without any global locks. I have seen that randownwriter 
and dfsio that used to fail on a 1000 node cluster now runs successfully with 
this patch.

1. NetworkTopology has reader/writer locks. This map hardly changes but is used 
very frequently. Now, multiple open() calls can proceed in parallel.

2. The pending blocks and pending files are put into a new class called 
pendingCreates.java. This helps locking them together.

3. The BlocksMap is protected by a reader/writer lock.

4. In the common case (when the file is still in pendingCreates), 
addStordBlock() does not acquire the global fsnamesystem lock.

5. The datanodeMap was already using a lock object associated with it to 
protect modifications to it. Make sure that this check is done in all places 
where the datanodeMap is modified.

6. The Host2NodesMap has its own read/write lock. This will be merged in with 
the datanodeMap when we go to a much finer locking model in future.

This patch is for code review purposes only. Some additional locking is needed 
for processReport (still to be done) but I would like some comments on the 
changes I have made.

I would have liked a more-finer grain locking model that allows all 
filesystem-methods to be highly-concurrent. But that approach was deemed 
too-complex for the short term. I am putting out this patch to get feedback on 
whether this medium-term approach is acceptable.  





> DFS Scalability: Reduce the number of getAdditionalBlock RPCs on the namenode
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-1306
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1306
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>         Attachments: fineGrainLocks2.patch
>
>
> One of the most-frequently-invoked RPCs in the namenode is the addBlock() 
> RPC. The DFSClient uses this RPC to allocate one more block for a file that 
> it is currently operating upon. The scalability of the namenode will improve 
> if we can decrease the number of addBlock() RPCs. One idea that we want to 
> discuss here is to make addBlock() return more than one block. This proposal 
> came out of a discussion I had with Ben Reed. 
> Let's say that addBlock() returns n blocks for the file. The namenode already 
> tracks these blocks using the pendingCreates data structure. The client 
> guarantees that these n blocks will be used in order. The client also 
> guarantees that if it cannot use a block (dues to whatever reason), it will 
> inform the namenode using the abandonBlock() RPC. These RPCs are already 
> supported.
> Another possible optimization : since the namenode has to allocate n blocks 
> for a file, should it use the same set of datanodes for this set of blocks? 
> My proposal is that if n is a small number (e.g. 3), it is prudent to 
> allocate the same set of datanodes to host all replicas for this set of 
> blocks. This will reduce the CPU spent in chooseTargets().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to