[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923782#comment-16923782
 ] 

Konstantin Shvachko commented on HDFS-14703:
--------------------------------------------

Good questions guys, thanks.

??how to handle block reports???
Yes the blocks are partitioned based on the INodeMap partitions. Each range in 
INodeMap forms a GSet in the BlocksMap, which contains all blocks belonging to 
the files in the given range of inodes. A more formal way of defining 
partitions is to say that _blockKey = <ppId, pId, fileId, blockId>_ and the 
partitioning key ranges for blocks are the same as for INodes.
Block report processing is per storage. My first thought was to process a 
storage report under the global lock (RangeMap lock), which is no worse than 
today. We can further optimize this by splitting the report into INode ranges 
first and then processing them concurrently. The details may be tricky, as 
anything concerning block reports.
??if I hold a Range Map lock, does it mean that I can operate safely???
You should be. The RangeMap lock is like the global lock, because everybody has 
to enter it first thing for any operation. One still need to check RangeGSet 
lock in case somebody is still modifying this GSet, but new threads cannot 
enter since they will be blocked on obtaining the RangeMap lock.
??is it possible that Range Map lock might have to wait a really long time for 
the Range Set locks to be released???
Not really. You grab the RangeMap lock as soon as you can. Then proceed into 
RangeGSet once nobody else has the lock on it. GSet locks should drain pretty 
fast since nobody new is entering. 

As I mentioned in the document, locking schema needs a separate detailed design.

> NameNode Fine-Grained Locking via Metadata Partitioning
> -------------------------------------------------------
>
>                 Key: HDFS-14703
>                 URL: https://issues.apache.org/jira/browse/HDFS-14703
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs, namenode
>            Reporter: Konstantin Shvachko
>            Priority: Major
>         Attachments: 001-partitioned-inodeMap-POC.tar.gz, NameNode 
> Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to