[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909681#comment-16909681
 ] 

He Xiaoqiao commented on HDFS-14703:
------------------------------------

Thanks [~shv] for your POC patches. I have to state that this is very clever 
design for fine-grained global locking. There are still couple of questions 
what I do not quite understand and look forward to your response.
1. Write concurrency control. Consider one case with two threads with mkdir 
(/a/b/c/d/e) and delete(/a/b/c) ops. I try to ran this case following design 
and POC patches, but I usually get unstable result since key with <ida,idb,idc> 
and <idc, idd, ide> could be located at different RangeGSet using 
{{INodeMap#latchWriteLock}}, then the two threads could run concurrently and 
get unstable result even if from one client and one by one. As your last 
explains, `deleting a directory should lock all RangeGets involved`. Is it one 
special case about Delete Ops? Sorry for asking this question again.
{quote}
Deleting a directory /a/b/c means deleting the entire sub-tree underneath this 
directory. We should lock all RangeGSets involved in such deletion, 
particularly the one containing file f. So f cannot be modified concurrently 
with the delete.
{quote}
2. {{INode}} involves local variable {{long[] namespaceKey}} at 0004 in POC 
package. I believe this attributes is very useful to partition for INode. 
meanwhile does it bring some other potential issues
* heap footprint overhead. For a long while running of NameNode process, 
namespaceKey of most INode (visited once at least) in the directory tree may be 
not null. If we consider there are 500M INodes and {{level}} is both 2, it need 
over than 8GB heap size.
* when one INode is renamed, the {{namespaceKey}} have to update, right? Since 
its parent INode has changes. POC seems not update anymore if {{namespaceKey}} 
is not null.
Is it possible to calculate namespaceKey for INode when use it out of the Lock. 
Of course, it will bring CPU overhead. Please correct me if I am wrong. Thanks.
3. No LatchLock unlock in the POC for operation #mkdir, it seems like a bit of 
oversight. In my opinion, it has to release childLock after used, right?
[~shv] Thanks for your POC patches again and looks forward to the next 
milestone. And I would like to involve to push forward this feature if need.

> NameNode Fine-Grained Locking via Metadata Partitioning
> -------------------------------------------------------
>
>                 Key: HDFS-14703
>                 URL: https://issues.apache.org/jira/browse/HDFS-14703
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs, namenode
>            Reporter: Konstantin Shvachko
>            Priority: Major
>         Attachments: 001-partitioned-inodeMap-POC.tar.gz, NameNode 
> Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to