[ https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145743#comment-17145743 ]
Konstantin Shvachko commented on HDFS-15421: -------------------------------------------- Great collaboration here guys. Did some digging in the code. # {{internalReleaseLease()}} can trigger three transactions: {{OP_CLOSE}}, {{OP_SET_GENSTAMP}}, {{OP_REASSIGN_LEASE}}. First two already handle genStamp correctly with the patch. The last one does not have new genStamp. # I think adding {{applyImpendingGenerationStamp()}} in {{OP_REASSIGN_LEASE}} is incorrect as it restores the race condition of HDFS-14941. And the comment is confusing: even though the two transactions are added to edits under the common lock, their execution on SBN happens outside the lock and is not atomic. # Found one more place {{FSEditLogLoader.addNewBlock()}} were we need to add {{setGenerationStampIfGreater()}}. {{addNewBlock()}} adds a block with a new genStamp. Here is the list of all operations that can add new genStamp. LMK if I missed any # OP_ADD # OP_ADD_BLOCK # OP_UPDATE_BLOCKS # OP_SET_GENSTAMP # OP_CLOSE # OP_TRUNCATE I think all of them except OP_ADD_BLOCK use {{setGenerationStampIfGreater()}} with the latest patch. Worth double checking of course. > IBR leak causes standby NN to be stuck in safe mode > --------------------------------------------------- > > Key: HDFS-15421 > URL: https://issues.apache.org/jira/browse/HDFS-15421 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Reporter: Kihwal Lee > Assignee: Akira Ajisaka > Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, > HDFS-15421.002.patch, HDFS-15421.003.patch, HDFS-15421.004.patch, > HDFS-15421.005.patch, HDFS-15421.006.patch, HDFS-15421.007.patch > > > After HDFS-14941, update of the global gen stamp is delayed in certain > situations. This makes the last set of incremental block reports from append > "from future", which causes it to be simply re-queued to the pending DN > message queue, rather than processed to complete the block. The last set of > IBRs will leak and never cleaned until it transitions to active. The size of > {{pendingDNMessages}} constantly grows until then. > If a leak happens while in a startup safe mode, the namenode will never be > able to come out of safe mode on its own. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org