[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346591#comment-17346591 ]
Virajith Jalaparti edited comment on HDFS-15915 at 5/18/21, 5:54 AM: --------------------------------------------------------------------- Thanks for finding this and providing a fix [~shv]. A few questions: # Nit: Should the default implementation of {{EditLogOutputStream#getLastJournalledTxId}} return a value of -1 instead of 0 as 0 can be a valid txid? # Nit: In the current implementation, the return value of {{beginTransaction}} is used to get the start time in one place but ignored in other places. Should we just make it return void and force the caller to track the start time? # Without this change, the previous implementation seems to have relied on the ordering within the queue (elements added under the FSN lock) ({{FSEditLogAsync#editPendingQ}}) to ensure that the order in which edits are assigned txids is the same in which they are processed. Why is that not sufficient when Observer is not used? was (Author: virajith): Thanks for finding this and providing a fix [~shv]. A few questions: # Nit: Should the default implementation of {{EditLogOutputStream#getLastJournalledTxId}} return a value of -1 instead of 0 as 0 can be a valid txid? # Nit: In the current implementation, the return value of {{beginTransaction}} is used to get the start time in one place but ignored in other places. Should we just make it return void and force the caller to track the start time? # Without this change, the previous implementation seems to have relied on the ordering within the queue (elements added under the FSN lock) ({{FSEditLogAsync#editPendingQ}}) to ensure that the order in which edits are assigned txids is the same in which they are processed. Why is that not sufficient? > Race condition with async edits logging due to updating txId outside of the > namesystem log > ------------------------------------------------------------------------------------------ > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode > Reporter: Konstantin Shvachko > Assignee: Konstantin Shvachko > Priority: Major > Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, > HDFS-15915-03.patch, testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org