[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968198#comment-13968198
 ] 

Germán Blanco commented on ZOOKEEPER-1794:
------------------------------------------

Thank you for looking into this and your questions, Flavio.
For ZooKeeper to work, it is necessary to have the same history of transactions 
in all servers. The algorithm and the protocol ensure that this is so, and the 
implementation follows those principles in order to have a working ZooKeeper 
cluster. So *in theory* a ZooKeeper cluster will always have consistent data 
across the servers. However, there is no verification method for this theory. 
This means that currently if for whatever reason (software fault, manual 
operation mistake, installation that doesn't follow all principles) something 
happens that causes an inconsistency in the information, this inconsistency is 
never detected.
The proposal here is to add a checksum so that it is possible to verify that 
information in all servers is the same. Not only for information in disk, also 
for information in memory up to the last transaction committed. This means that 
inconsistencies will be detected quickly. It might also enable new scenarios. 
Things like brain splits will be detected if they ever happen. It might be safe 
(or at least less unsafe) to use RAMDISK for the servers, at least as long as 
data is short-lived (i.e. it is ok to lose all data in the cluster, but it is 
not ok to have inconsistencies). Because once that there is a way to detect 
inconsistencies, it seems easy to e.g. go back to a last correct state or go 
back to position 0.
There might be other options for this, but I don't know them.

> Add hash check to transaction history in quorum servers
> -------------------------------------------------------
>
>                 Key: ZOOKEEPER-1794
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1794
>             Project: ZooKeeper
>          Issue Type: Sub-task
>          Components: quorum
>    Affects Versions: 3.5.0
>            Reporter: Germán Blanco
>            Assignee: Germán Blanco
>             Fix For: 3.6.0
>
>         Attachments: ZOOKEEPER-1794.patch, ZOOKEEPER-1794.patch, 
> ZOOKEEPER-1794.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The goal of this task is to add a hash number to each transaction in the 
> transaction history. This hash number will depend on the entire transaction 
> history. This hash number will be the same in all members of the quorum, 
> since it shall have the same result if the members have the same transaction 
> history. That means that there will be no need to send any new information 
> between members of the quorum, during the broadcast phase. The hash number 
> will be checked by the leader when learners try to connect, and it shall also 
> be sent together with the snapshot during synchronisation. If the hash number 
> does not match, the synchronisation shall be done with a snapshot in order to 
> overwrite the conflicts in the transaction history.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to