[ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346591#comment-17346591
 ] 

Virajith Jalaparti edited comment on HDFS-15915 at 5/18/21, 5:58 AM:
---------------------------------------------------------------------

Thanks for finding this and providing a fix [~shv]. A few questions:
# Nit: Should the default implementation of 
{{EditLogOutputStream#getLastJournalledTxId}} return a value of -1 instead of 0 
as 0 can be a valid txid?
# Nit: In the current implementation, the return value of {{beginTransaction}} 
is used to get the start time in one place but ignored in other places. Should 
we just make it return void and force the caller to track the start time?
# Without this change, the previous implementation seems to have relied on the 
ordering within the queue (elements added under the FSN lock) 
({{FSEditLogAsync#editPendingQ}}) to ensure that the order in which edits are 
assigned txids is the same in which they are processed. Why is that not 
sufficient when Observer is not used?

The test you added {{TestObserverNode#testMkdirsRaceWithObserverRead}} 
demonstrates the stale reads when CRS is used. Thanks for adding this!





was (Author: virajith):
Thanks for finding this and providing a fix [~shv]. A few questions:
# Nit: Should the default implementation of 
{{EditLogOutputStream#getLastJournalledTxId}} return a value of -1 instead of 0 
as 0 can be a valid txid?
# Nit: In the current implementation, the return value of {{beginTransaction}} 
is used to get the start time in one place but ignored in other places. Should 
we just make it return void and force the caller to track the start time?
# Without this change, the previous implementation seems to have relied on the 
ordering within the queue (elements added under the FSN lock) 
({{FSEditLogAsync#editPendingQ}}) to ensure that the order in which edits are 
assigned txids is the same in which they are processed. Why is that not 
sufficient when Observer is not used?




> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15915
>                 URL: https://issues.apache.org/jira/browse/HDFS-15915
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, namenode
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Major
>         Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, 
> HDFS-15915-03.patch, testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to