[
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903374#comment-16903374
]
Konstantin Shvachko commented on HDFS-14703:
--------------------------------------------
Hi [~hexiaoqiao], thanks for reviewing the doc. Very good questions:
# "Cousins" means files like {{/a/b/c/d}} and {{/a/b/m/n}}. They will have
keys, respectively, {{<idb, idc, idd>}} and {{<idb, idm, idn>}}, which have
common prefix {{<idb>}} and therefore are likely to fall into the same
RangeGSet. In your example {{<ida, idb, idc>}} is the parent of {{<idb, idc,
idd>}} and this key definition does not guarantee them to be in the same range.
# Deleting a directory {{/a/b/c}} means deleting the entire sub-tree underneath
this directory. We should lock all RangeGSets involved in such deletion,
particularly the one containing containing file {{f}}. So {{f}} cannot be
modified concurrently with the delete.
# Just to clarify RangeMap is the upper level part of PartitionedGSet, which
maps key ranges into RangeGSets. So there is only one RangeMap and many
RangeGSets. Holding a lock on RangeMap is akin to holding a global lock. You
make a good point that some operations like failover, large deletes, renames,
quota changes will still require a global lock. The lock on RangeMap could play
the role of such global lock. This should be defined in more details within the
design of LatchLock. Ideally we should retain FSNamesystemLock as a global lock
for some operations. This will also help us gradually switch operations from
FSNamesystemLock to LatchLock.
# I don't know what the next bottleneck we will see, but you are absolutely
correct there will be something. For edits log, I indeed saw while running my
benchmarks that the number of transactions batched together while journaling
was increasing. This is expected and desirable behavior, since writing large
batches to a disk is more efficient than lots of small writes.
> NameNode Fine-Grained Locking via Metadata Partitioning
> -------------------------------------------------------
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs, namenode
> Reporter: Konstantin Shvachko
> Priority: Major
> Attachments: NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace
> into multiple partitions each having a separate lock. Intended to improve
> performance of NameNode write operations.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]