[ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334483#comment-17334483
 ] 

Konstantin Shvachko commented on HDFS-15915:
--------------------------------------------

Attaching a patch to fix the problem. The is a lot of moving parts in 
asynchronous journal logging, took me a while to get it working, although the 
actual fix doesn't look complex.
# The main idea is that a new txId is assigned to the journal transaction when 
it is logged by {{logEdit(op)}} when the call is still under {{fsn.writeLock}}, 
rather than later while in {{logSync()}} as it is now.
I think this is the right way to _*guarantee that all transactions are 
journalled in the same order as they were applied on Active NameNode*_.
# Currently we do not have checks or tests against mismatch of the transactions 
order. This would have been a problem for regular HA with or without Observer. 
I could not build a test, which would show the order of transactions can be 
tampered with, but couldn't convince myself it is impossible either.
The patch adds asserts to guarantee the journal txIds order is the same as they 
were applied to ANN.
# I had to rework {{TestEditLogRace.testDeadlock()}}. Changed it to mock on 
{{doEditTransaction()}} instead of on {{setTransactionId()}} for the "blocker 
thread". Also with FSEditLogAsync we cannot really reuse the same operation 
instance for different transactions any more as they now have txid set in it 
before syncing. This is [~daryn]'s creation. woud appreciate if you could take 
a look.

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15915
>                 URL: https://issues.apache.org/jira/browse/HDFS-15915
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, namenode
>            Reporter: Konstantin Shvachko
>            Priority: Major
>         Attachments: testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to