hfutatzhanghb commented on code in PR #6737: URL: https://github.com/apache/hadoop/pull/6737#discussion_r1609154301
########## hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/NamenodeFGL.md: ########## @@ -0,0 +1,210 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +HDFS Namenode Fine-grained Locking +================================== + +<!-- MACRO{toc|fromDepth=0|toDepth=3} --> + +Overview +-------- + +HDFS relies on a single master, the Namenode (NN), as its metadata center. +From an architectural point of view, a few elements make NN the bottleneck of an HDFS cluster: +* NN keeps the entire namespace in memory (directory tree, blocks, Datanode related info, etc.) +* Read requests (`getListing`, `getFileInfo`, `getBlockLocations`) are served from memory. +Write requests (`mkdir`, `create`, `addBlock`, `complete`) update the memory state and write a journal transaction into QJM. +Both types of requests need a locking mechanism to ensure data consistency and correctness. +* All requests are funneled into NN and have to go through the global FS lock. +Each write operation acquires this lock in write mode and holds it until that operation is executed. +This lock mode prevents concurrent execution of write operations even if they involve different branches of the directory tree. + +NN fine-grained locking (FGL) implementation aims to alleviate this bottleneck by allowing concurrency of disjoint write operations. + +JIRA: [HDFS-17366](https://issues.apache.org/jira/browse/HDFS-17366) + +Design +------ +In theory, fully independent operations can be processed concurrently, such as operations involving different subdirectory trees. +As such, NN can split the global lock into the full path lock, just using the full path lock to protect a special subdirectory tree. + +### RPC Categorization + +Roughly, RPC operations handled by NN can be divided into 8 main categories + +| Category | Operations | +|----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Involving namespace tree | `mkdir`, `create` (without overwrite), `getFileInfo` (without locations), `getListing` (without locations), `setOwner`, `setPermission`, `getStoragePolicy`, `setStoragePolicy`, `rename`, `isFileClosed`, `getFileLinkInfo`, `setTimes`, `modifyAclEntries`, `removeAclEntries`, `setAcl`, `getAcl`, `setXAttr`, `getXAttrs`, `listXAttrs`, `removeXAttr`, `checkAccess`, `getErasureCodingPolicy`, `unsetErasureCodingPolicy`, `getQuotaUsage`, `getPreferredBlockSize` | +| Involving only blocks | `reportBadBlocks`, `updateBlockForPipeline`, `updatePipeline` | +| Involving only DNs | `registerDatanode`, `setBalancerBandwidth`, `sendHeartbeat` | +| Involving both namespace tree & blocks | `getBlockLocation`, `create` (with overwrite), `append`, `setReplication`, `abandonBlock`, `addBlock`, `getAdditionalDatanode`, `complete`, `concat`, `truncate`, `delete`, `getListing` (with locations), `getFileInfo` (with locations), `recoverLease`, `listCorruptFileBlocks`, `fsync`, `commitBlockSynchronization`, `RedundancyMonitor`, `processMisReplicatedBlocks` | +| Involving both DNs & blocks | `getBlocks`, `errorReport` | +| Involving namespace tree, DNs & blocks | `blockReport`, `blockReceivedAndDeleted`, `HeartbeatManager`, `Decommission` | Review Comment: @kokonguyen191 @ZanderXu Sir, i did not see any namespace tree modification in the scope of writeLock() in method HeartbeatManager#heartbeatCheck. Please correct me if i mistook it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
