[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

He Xiaoqiao (JIRA) Wed, 07 Aug 2019 02:18:10 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901887#comment-16901887
 ]


He Xiaoqiao commented on HDFS-14703:
------------------------------------

Thanks [~shv] for file this JIRA and plan to push this feature forward, it is 
very great work. Really appreciate doing this.
 There are some details I am confused after reading the design document.
 As design document said, each inode maps (through inode key) to one RangeMap 
who has a separate lock and carry out concurrently.
{quote}The inode key is a fixed length sequence of parent inodeids ending with 
the file inode id itself:
    key(f) = <ppId, pId, selfId>
 Where selfId is the inodeId of file f, pId is the id of its parent, and ppId 
is the id of the parent of the parent. Such definition of a key guarantees that 
not only siblings but also cousins (objects having the same grandparent) are 
partitioned into the same range most of the time
{quote}
Consider the following path: /a/b/c/d/e, corresponding inode id is [ida, idb, 
idc, idd].
 1. How we could guarantee to map 'cousins' into the same range? In my first 
opinion, it could map to different RangeMaps, since for idc, its inode key = 
<ida, idb, idc> and for idd its inode key = <idb, idc, idd>.
 2. Any consideration about operating one nodes and its ancestor node 
concurrently? for instance, /a/b/c/d/e/f, we could delete inode c and modify 
inode f at the same time if they map to different range since we do not 
guarantee map them to the same one. maybe it is problem in the case.
 3. Which lock will be hold if request some global request like ha failover, 
safemode etc.? do we need to obtain all RangeMap lock?
 4. Any bottleneck meet after improve write throughput, I believe that EditLog 
OPS will keep increase, and will it to be the new bottleneck?
Please correct me if I do not understand correctly. Thanks.

> NameNode Fine-Grained Locking via Metadata Partitioning
> -------------------------------------------------------
>
>                 Key: HDFS-14703
>                 URL: https://issues.apache.org/jira/browse/HDFS-14703
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs, namenode
>            Reporter: Konstantin Shvachko
>            Priority: Major
>         Attachments: NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

Reply via email to