Hi community,

The Ozone SCM HA [1] is happening. Ozone SCM HA utilizes Ratis to build its
consensus on states. When working on it, one of the hard problems I found
is split-brian in which two leaders co-exists so SCM HA needs to deal with
stale commands from the old leader.

One of the challenges is how to simulate network partitioning so we can
write meaningful tests to verify the implementation of dealing with stale
commands. This probably will require:

1. Have a config to make the old leader never turn to candidate (e.g.
increase the timeout of re-election)
2. Has a way to block the in/out communication of the leader so creating a
network partitioning case.

The 1 should easily work. Do you know how to tackle the 2?


[1]: https://issues.apache.org/jira/browse/HDDS-2823


-Rui

Reply via email to