[
https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liutongwei updated HDFS-16493:
------------------------------
Description:
Although fast path tail use quorum read to pull edit log, it seem like can read
uncommitted data in some corner case.
Here is an example. Suppose we have three JN, their init state is:
{code:java}
epoch 1
JN1 [1-3](in-progress)
JN2 [1-3](in-progress)
JN3 [1-4](in-progress)
Note that, in epoch 1 txid 1-3 was committed, and txid 4 not.
{code}
When a failover occur, if a new writer cannot contact to JN3 for network
partition, and finish the recovery stage, and write a new txid 4 in epoch 2,
which value not equal to JN3's.
{code:java}
epcho 2
JN1 [1-3](finalized) [4-4](inprogress)
JN2 [1-3](finalized) [4-4](inprogress)
JN3 [1-4](inprogress)
Note that, in JN3 txid4's value not equal to other JN.
{code}
Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it
got majority response. But it got logs of same length but different content.And
no more information to choose which log is right. If we choose JN3, we got meta
data corruption.
There is a test example patch [^example.patch] for running and debug.
For fix it i think we should add finalized state to
{{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log.
was:
Although fast path tail use quorum read to pull edit log, it seem like is can
read uncommitted data in some corner case.
Here is an example. Suppose we have three JN, their init state is:
{code:java}
epoch 1
JN1 [1-3](in-progress)
JN2 [1-3](in-progress)
JN3 [1-4](in-progress)
Note that, in epoch 1 txid 1-3 was committed, and txid 4 not.
{code}
When a failover occur, if a new writer cannot contact to JN3 for network
partition, and finish the recovery stage, and write a new txid 4 in epoch 2,
which value not equal to JN3's.
{code:java}
epcho 2
JN1 [1-3](finalized) [4-4](inprogress)
JN2 [1-3](finalized) [4-4](inprogress)
JN3 [1-4](inprogress)
Note that, in JN3 txid4's value not equal to other JN.
{code}
Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it
got majority response. But it got logs of same length but different content.And
no more information to choose which log is right. If we choose JN3, we got meta
data corruption.
There is a test example patch [^example.patch] for running and debug.
For fix it i think we should add finalized state to
{{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log.
> [SBN Read]When fast path tail enabled, standby or observer namenode may read
> uncommitted data
> ---------------------------------------------------------------------------------------------
>
> Key: HDFS-16493
> URL: https://issues.apache.org/jira/browse/HDFS-16493
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: journal-node, namanode
> Reporter: liutongwei
> Priority: Critical
> Attachments: example.patch
>
>
> Although fast path tail use quorum read to pull edit log, it seem like can
> read uncommitted data in some corner case.
> Here is an example. Suppose we have three JN, their init state is:
>
> {code:java}
> epoch 1
> JN1 [1-3](in-progress)
> JN2 [1-3](in-progress)
> JN3 [1-4](in-progress)
> Note that, in epoch 1 txid 1-3 was committed, and txid 4 not.
> {code}
> When a failover occur, if a new writer cannot contact to JN3 for network
> partition, and finish the recovery stage, and write a new txid 4 in epoch 2,
> which value not equal to JN3's.
>
> {code:java}
> epcho 2
> JN1 [1-3](finalized) [4-4](inprogress)
> JN2 [1-3](finalized) [4-4](inprogress)
> JN3 [1-4](inprogress)
> Note that, in JN3 txid4's value not equal to other JN.
> {code}
>
> Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it
> got majority response. But it got logs of same length but different
> content.And no more information to choose which log is right. If we choose
> JN3, we got meta data corruption.
> There is a test example patch [^example.patch] for running and debug.
> For fix it i think we should add finalized state to
> {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]