This is an automated email from the ASF dual-hosted git repository.
maoling pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/zookeeper.git
The following commit(s) were added to refs/heads/master by this push:
new 7fdadf7 ZOOKEEPER-3764: Add High Availability Guarantee Into Docs
7fdadf7 is described below
commit 7fdadf7273f34dd0552db25a3771cf55b65e9208
Author: Winbobob <[email protected]>
AuthorDate: Sat Apr 10 13:41:38 2021 +0800
ZOOKEEPER-3764: Add High Availability Guarantee Into Docs
Include the formula for calculating the maximum number of server failures
in the ZK doc.
> https://issues.apache.org/jira/browse/ZOOKEEPER-3764
Author: Winbobob <[email protected]>
Reviewers: maoling <[email protected]>
Closes #1661 from Winbobob/ZOOKEEPER-3764 and squashes the following
commits:
40d7815e2 [Winbobob] Fix a typo
b9eda4dc1 [Winbobob] ZOOKEEPER-3764: Add High Availability Guarantee Into
Docs
---
.../src/main/resources/markdown/zookeeperAdmin.md | 29 ++++++++++++++++------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
b/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
index 9fefadf..e6383fd 100644
--- a/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
+++ b/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
@@ -321,14 +321,27 @@ machine in your deployment.
For the ZooKeeper service to be active, there must be a
majority of non-failing machines that can communicate with
-each other. To create a deployment that can tolerate the
-failure of F machines, you should count on deploying 2xF+1
-machines. Thus, a deployment that consists of three machines
-can handle one failure, and a deployment of five machines can
-handle two failures. Note that a deployment of six machines
-can only handle two failures since three machines is not a
-majority. For this reason, ZooKeeper deployments are usually
-made up of an odd number of machines.
+each other. For a ZooKeeper ensemble with N servers,
+if N is odd, the ensemble is able to tolerate up to N/2
+server failures without losing any znode data;
+if N is even, the ensemble is able to tolerate up to N/2-1
+server failures.
+
+For example, if we have a ZooKeeper ensemble with 3 servers,
+the ensemble is able to tolerate up to 1 (3/2) server failures.
+If we have a ZooKeeper ensemble with 5 servers,
+the ensemble is able to tolerate up to 2 (5/2) server failures.
+If the ZooKeeper ensemble with 6 servers, the ensemble
+is also able to tolerate up to 2 (6/2-1) server failures
+without losing data and prevent the "brain split" issue.
+
+ZooKeeper ensemble is usually has odd number of servers.
+This is because with the even number of servers,
+the capacity of failure tolerance is the same as
+the ensemble with one less server
+(2 failures for both 5-node ensemble and 6-node ensemble),
+but the ensemble has to maintain extra connections and
+data transfers for one more server.
To achieve the highest probability of tolerating a failure
you should try to make machine failures independent. For