Jian Zhang created HDFS-17198:
---------------------------------
Summary: RBF: fix bug of getRepresentativeQuorum
Key: HDFS-17198
URL: https://issues.apache.org/jira/browse/HDFS-17198
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Jian Zhang
h2. *Bug description*
In the original implementation, when each router reports nn status at different
times, the nn status is the status reported by majority routers, for example:
router1 -> nn0:active dateModified:1
router2 -> nn0:active dateModified:2
router3 -> nn0:active dateModified:3
router0 -> nn0:standby dateModified:4
Then, the status of nn0 is active, because majority routers report that nn0 is
active.
If majority routers report nn status at the same time, for example:
(record1) router1 -> nn0:active dateModified:1
(record2) router2 -> nn0:active dateModified:1
(record3) router3 -> nn0:active dateModified:1
(record4) router0 -> nn0:standbydateModified:2
Then the state of nn0 is standby, but We expect the status of nn0 is active
This bug is because the above record is put into the Treeset in the method
getRepresentativeQuorum. Since record1,2,3 have the same dateModified, there
will only be one record in the final treeset of this method, so this method
thinks that this nn is standby, because record4 newer
h2. *How to reproduce*
Running my unit test testRegistrationMajorityQuorumEqDateModified, but using
the original code
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]