Hi dev,

Correctly scaling the cluster has always been a non-trivial task. There were 
some previous discussions in user mailing list, see 
https://lists.apache.org/[email protected]:2022-7:scale 
<https://lists.apache.org/[email protected]:2022-7:scale>. In that 
thread, we reached consensus as Tsz-Wo concluded as 

> The original design was to call setConf first and then start the nodes. I 
> understand that it may not be convenient to enforce such an order. We may 
> consider having an allow-list so that the Leader won't shut down the nodes in 
> the allow-list.

IoTDB adopted this order (setConf first and then start) in a recent PR 
https://github.com/apache/iotdb/pull/7712 
<https://github.com/apache/iotdb/pull/7712>. However, new errors occurred, 
https://issues.apache.org/jira/browse/RATIS-1749 
<https://issues.apache.org/jira/browse/RATIS-1749> as one example.

So I investigated on Ratis users and found out that Ozone and Alluxio both will 
start server first and then call setConf. IoTDB originally takes the same 
order, although SHUTDOWN error did occurred a few.

So I’m a little confused now. Is there a best practice Ratis recommends to 
scale the raft cluster? Any suggestions would be much appreciated.

Best Regards,
William

Reply via email to