[
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334483#comment-17334483
]
Konstantin Shvachko commented on HDFS-15915:
--------------------------------------------
Attaching a patch to fix the problem. The is a lot of moving parts in
asynchronous journal logging, took me a while to get it working, although the
actual fix doesn't look complex.
# The main idea is that a new txId is assigned to the journal transaction when
it is logged by {{logEdit(op)}} when the call is still under {{fsn.writeLock}},
rather than later while in {{logSync()}} as it is now.
I think this is the right way to _*guarantee that all transactions are
journalled in the same order as they were applied on Active NameNode*_.
# Currently we do not have checks or tests against mismatch of the transactions
order. This would have been a problem for regular HA with or without Observer.
I could not build a test, which would show the order of transactions can be
tampered with, but couldn't convince myself it is impossible either.
The patch adds asserts to guarantee the journal txIds order is the same as they
were applied to ANN.
# I had to rework {{TestEditLogRace.testDeadlock()}}. Changed it to mock on
{{doEditTransaction()}} instead of on {{setTransactionId()}} for the "blocker
thread". Also with FSEditLogAsync we cannot really reuse the same operation
instance for different transactions any more as they now have txid set in it
before syncing. This is [~daryn]'s creation. woud appreciate if you could take
a look.
> Race condition with async edits logging due to updating txId outside of the
> namesystem log
> ------------------------------------------------------------------------------------------
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs, namenode
> Reporter: Konstantin Shvachko
> Priority: Major
> Attachments: testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the
> edits op remains unset until the time when the operation is scheduled for
> synching. At that time {{beginTransaction()}} will set the the
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy
> NameNode this event can fall outside the write lock.
> This causes problems for Observer reads. It also can potentially reshuffle
> transactions and Standby will apply them in a wrong order.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]