[
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909681#comment-16909681
]
He Xiaoqiao edited comment on HDFS-14703 at 8/17/19 12:41 PM:
--------------------------------------------------------------
Thanks [~shv] for your POC patches. I have to state that this is very clever
design for fine-grained global locking. There are still couple of questions
what I do not quite understand and look forward to your response.
1. Write concurrency control. Consider one case with two threads with mkdir
(/a/b/c/d/e) and delete(/a/b/c) ops. I try to ran this case following design
and POC patches, but I usually get unstable result since key with <ida,idb,idc>
and <idc, idd, ide> could be located at different RangeGSet using
{{INodeMap#latchWriteLock}}, then the two threads could run concurrently and
get unstable result even if from one client and one by one. As your last
explains, `deleting a directory should lock all RangeGets involved`. Is it one
special case about Delete Ops? Sorry for asking this question again.
{quote}
Deleting a directory /a/b/c means deleting the entire sub-tree underneath this
directory. We should lock all RangeGSets involved in such deletion,
particularly the one containing file f. So f cannot be modified concurrently
with the delete.
{quote}
2. {{INode}} involves local variable {{long[] namespaceKey}} at 0004 in POC
package. I believe this attributes is very useful to partition for INode.
meanwhile does it bring some other potential issues
* heap footprint overhead. For a long while running of NameNode process,
namespaceKey of most INode (visited once at least) in the directory tree may be
not null. If we consider there are 500M INodes and {{level}} is both 2, it need
over than 8GB heap size.
* when one INode is renamed, the {{namespaceKey}} have to update, right? Since
its parent INode has changes. POC seems not update anymore if {{namespaceKey}}
is not null.
Is it possible to calculate namespaceKey for INode when use it out of the Lock.
Of course, it will bring CPU overhead. Please correct me if I am wrong. Thanks.
3. No LatchLock unlock in the POC for operation #mkdir, it seems like a bit of
oversight. In my opinion, it has to release childLock after used, right?
[~shv] Thanks for your POC patches again and looks forward to the next
milestone. And I would like to involve to push forward this feature if need.
was (Author: hexiaoqiao):
Thanks [~shv] for your POC patches. I have to state that this is very clever
design for fine-grained global locking. There are still couple of questions
what I do not quite understand and look forward to your response.
1. Write concurrency control. Consider one case with two threads with mkdir
(/a/b/c/d/e) and delete(/a/b/c) ops. I try to ran this case following design
and POC patches, but I usually get unstable result since key with <ida,idb,idc>
and <idc, idd, ide> could be located at different RangeGSet using
{{INodeMap#latchWriteLock}}, then the two threads could run concurrently and
get unstable result even if from one client and one by one. As your last
explains, `deleting a directory should lock all RangeGets involved`. Is it one
special case about Delete Ops? Sorry for asking this question again.
{quote}
Deleting a directory /a/b/c means deleting the entire sub-tree underneath this
directory. We should lock all RangeGSets involved in such deletion,
particularly the one containing file f. So f cannot be modified concurrently
with the delete.
{quote}
2. {{INode}} involves local variable {{long[] namespaceKey}} at 0004 in POC
package. I believe this attributes is very useful to partition for INode.
meanwhile does it bring some other potential issues
* heap footprint overhead. For a long while running of NameNode process,
namespaceKey of most INode (visited once at least) in the directory tree may be
not null. If we consider there are 500M INodes and {{level}} is both 2, it need
over than 8GB heap size.
* when one INode is renamed, the {{namespaceKey}} have to update, right? Since
its parent INode has changes. POC seems not update anymore if {{namespaceKey}}
is not null.
Is it possible to calculate namespaceKey for INode when use it out of the Lock.
Of course, it will bring CPU overhead. Please correct me if I am wrong. Thanks.
3. No LatchLock unlock in the POC for operation #mkdir, it seems like a bit of
oversight. In my opinion, it has to release childLock after used, right?
[~shv] Thanks for your POC patches again and looks forward to the next
milestone. And I would like to involve to push forward this feature if need.
> NameNode Fine-Grained Locking via Metadata Partitioning
> -------------------------------------------------------
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs, namenode
> Reporter: Konstantin Shvachko
> Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, NameNode
> Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace
> into multiple partitions each having a separate lock. Intended to improve
> performance of NameNode write operations.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]