Hi dev, Correctly scaling the cluster has always been a non-trivial task. There were some previous discussions in user mailing list, see https://lists.apache.org/[email protected]:2022-7:scale <https://lists.apache.org/[email protected]:2022-7:scale>. In that thread, we reached consensus as Tsz-Wo concluded as
> The original design was to call setConf first and then start the nodes. I > understand that it may not be convenient to enforce such an order. We may > consider having an allow-list so that the Leader won't shut down the nodes in > the allow-list. IoTDB adopted this order (setConf first and then start) in a recent PR https://github.com/apache/iotdb/pull/7712 <https://github.com/apache/iotdb/pull/7712>. However, new errors occurred, https://issues.apache.org/jira/browse/RATIS-1749 <https://issues.apache.org/jira/browse/RATIS-1749> as one example. So I investigated on Ratis users and found out that Ozone and Alluxio both will start server first and then call setConf. IoTDB originally takes the same order, although SHUTDOWN error did occurred a few. So I’m a little confused now. Is there a best practice Ratis recommends to scale the raft cluster? Any suggestions would be much appreciated. Best Regards, William
