[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909681#comment-16909681 ]
He Xiaoqiao commented on HDFS-14703: ------------------------------------ Thanks [~shv] for your POC patches. I have to state that this is very clever design for fine-grained global locking. There are still couple of questions what I do not quite understand and look forward to your response. 1. Write concurrency control. Consider one case with two threads with mkdir (/a/b/c/d/e) and delete(/a/b/c) ops. I try to ran this case following design and POC patches, but I usually get unstable result since key with <ida,idb,idc> and <idc, idd, ide> could be located at different RangeGSet using {{INodeMap#latchWriteLock}}, then the two threads could run concurrently and get unstable result even if from one client and one by one. As your last explains, `deleting a directory should lock all RangeGets involved`. Is it one special case about Delete Ops? Sorry for asking this question again. {quote} Deleting a directory /a/b/c means deleting the entire sub-tree underneath this directory. We should lock all RangeGSets involved in such deletion, particularly the one containing file f. So f cannot be modified concurrently with the delete. {quote} 2. {{INode}} involves local variable {{long[] namespaceKey}} at 0004 in POC package. I believe this attributes is very useful to partition for INode. meanwhile does it bring some other potential issues * heap footprint overhead. For a long while running of NameNode process, namespaceKey of most INode (visited once at least) in the directory tree may be not null. If we consider there are 500M INodes and {{level}} is both 2, it need over than 8GB heap size. * when one INode is renamed, the {{namespaceKey}} have to update, right? Since its parent INode has changes. POC seems not update anymore if {{namespaceKey}} is not null. Is it possible to calculate namespaceKey for INode when use it out of the Lock. Of course, it will bring CPU overhead. Please correct me if I am wrong. Thanks. 3. No LatchLock unlock in the POC for operation #mkdir, it seems like a bit of oversight. In my opinion, it has to release childLock after used, right? [~shv] Thanks for your POC patches again and looks forward to the next milestone. And I would like to involve to push forward this feature if need. > NameNode Fine-Grained Locking via Metadata Partitioning > ------------------------------------------------------- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode > Reporter: Konstantin Shvachko > Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, NameNode > Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org