[
https://issues.apache.org/jira/browse/ZOOKEEPER-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968198#comment-13968198
]
Germán Blanco commented on ZOOKEEPER-1794:
------------------------------------------
Thank you for looking into this and your questions, Flavio.
For ZooKeeper to work, it is necessary to have the same history of transactions
in all servers. The algorithm and the protocol ensure that this is so, and the
implementation follows those principles in order to have a working ZooKeeper
cluster. So *in theory* a ZooKeeper cluster will always have consistent data
across the servers. However, there is no verification method for this theory.
This means that currently if for whatever reason (software fault, manual
operation mistake, installation that doesn't follow all principles) something
happens that causes an inconsistency in the information, this inconsistency is
never detected.
The proposal here is to add a checksum so that it is possible to verify that
information in all servers is the same. Not only for information in disk, also
for information in memory up to the last transaction committed. This means that
inconsistencies will be detected quickly. It might also enable new scenarios.
Things like brain splits will be detected if they ever happen. It might be safe
(or at least less unsafe) to use RAMDISK for the servers, at least as long as
data is short-lived (i.e. it is ok to lose all data in the cluster, but it is
not ok to have inconsistencies). Because once that there is a way to detect
inconsistencies, it seems easy to e.g. go back to a last correct state or go
back to position 0.
There might be other options for this, but I don't know them.
> Add hash check to transaction history in quorum servers
> -------------------------------------------------------
>
> Key: ZOOKEEPER-1794
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1794
> Project: ZooKeeper
> Issue Type: Sub-task
> Components: quorum
> Affects Versions: 3.5.0
> Reporter: Germán Blanco
> Assignee: Germán Blanco
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-1794.patch, ZOOKEEPER-1794.patch,
> ZOOKEEPER-1794.patch
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> The goal of this task is to add a hash number to each transaction in the
> transaction history. This hash number will depend on the entire transaction
> history. This hash number will be the same in all members of the quorum,
> since it shall have the same result if the members have the same transaction
> history. That means that there will be no need to send any new information
> between members of the quorum, during the broadcast phase. The hash number
> will be checked by the leader when learners try to connect, and it shall also
> be sent together with the snapshot during synchronisation. If the hash number
> does not match, the synchronisation shall be done with a snapshot in order to
> overwrite the conflicts in the transaction history.
--
This message was sent by Atlassian JIRA
(v6.2#6252)