lvfangmin commented on a change in pull request #1059: [ZOOKEEPER-3512] Real 
time data consistency check during broadcast time
URL: https://github.com/apache/zookeeper/pull/1059#discussion_r355199180
 
 

 ##########
 File path: zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
 ##########
 @@ -964,8 +964,59 @@ property, when available, is noted below.
 * *digest.enabled* :
     (Java system property only: **zookeeper.digest.enabled**)
     **New in 3.6.0:**
-    The digest feature is added to self-verify the correctness inside
-    ZooKeeper when loading database from disk, and syncing with leader.
+    The digest feature is added to detect the data inconsistency inside
+    ZooKeeper when loading database from disk, catching up and following
+    leader, its doing incrementally hash check for the DataTree based on 
+    the adHash paper mentioned in mentioned in 
+
+        https://cseweb.ucsd.edu/~daniele/papers/IncHash.pdf
+
+    The idea is simple, the hash value of DataTree will be updated 
incrementally 
+    based on the changes to the set of data. When the leader is preparing the 
txn, 
+    it will pre-calculate the hash of the tree based on the changes happened 
with 
+    formula:
+
+        current_hash = current_hash + hash(new node data) - hash(old node data)
+
+    If it’s creating a new node, the hash(old node data) will be 0, and if 
it’s a 
+    delete node op, the hash(new node data) will be 0.
+
+    This hash will be associated with each txn to represent the expected hash 
value 
+    after applying the txn to the data tree, it will be sent to followers with 
+    original proposals. Learner will compare the actual hash value with the 
one in 
+    the txn after applying the txn to the data tree, and report mismatch if 
it’s not 
+    the same.
+
+    These digest value will also be persisted with each txn and snapshot on 
the disk, 
+    so when servers restarted and load data from disk, it will compare and see 
if 
+    there is hash mismatch, which will help detect data loss issue on disk.
+
+    For the actual hash function, we’re using CRC internally, it’s not a 
collisionless 
+    hash function, but it’s more efficient compared to collisionless hash, and 
the 
+    collision possibility is really really rare and can already meet our needs 
here.
+
+    This feature is backward and forward compatible, so it can safely rolling 
upgrade, 
+    downgrade, enabled and later disabled without any compatible issue. Here 
are the 
+    scenarios have been covered and tested:
+
+    1. When leader runs with new code while follower runs with old one, the 
digest will 
+       be append to the end of each txn, follower will only read header and 
txn data, 
+       digest value in the txn will be ignored. It won't affect the follower 
reads and 
+       processes the next txn.
+    2. When leader runs with old code while follower runs with new one, the 
digest won't
+       be sent with txn, when follower tries to read the digest, it will throw 
EOF which 
+       is caught and handled gracefully with digest value set to null.
+    3. When loading old snapshot with new code, it will throw IOException when 
trying to
+       read the non-exist digest value, and the exception will be caught and 
digest will
+       be set to null, which means we won't compare digest when loading this 
snapshot, 
+       which is expected to happen during rolling upgrade
+    4. When loading new snapshot with old code, it will finish successfully 
after deserialzing 
+       the data tree, the digest value at the end of snapshot file will be 
ignored
+    5. The scenarios of rolling restart with flags change are similar to the 
1st and 2nd 
+       scenarios discussed above, if the leader enabled but follower not, 
digest value will
+       be ignored, and follower won't compare the digest during runtime; if 
leader disabled
+       but follower enabled, follower will get EOF exception which is handled 
gracefully.
+
     By default, this feautre is disabled, set "true" to enable it.
 
 Review comment:
   That's a good suggestion, I'll add that.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to