[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968219#comment-13968219
 ] 

Flavio Junqueira commented on ZOOKEEPER-1794:
---------------------------------------------

Ok, this is better. We need to converge on what we are trying achieve before we 
decide to go one way or the other. I think this should be optional at least. 

Now, the part that is tricky is that what this seems to be doing, does not 
provide strong guarantees and might make the semantics of the service 
confusing. For example, here is a quote from the description:

bq.  If the hash number does not match, the synchronization shall be done with 
a snapshot in order to overwrite the conflicts in the transaction history.

If there has been any corruption, I'm not convinced we should keep going. 
Raising an alarm sounds like a good thing, but pulling a snapshot from some 
other server in the presence of some bug or unforeseen event seems dangerous.

In general, I'm not really on favor of adding mechanisms without really 
understanding what we're trying to achieve. Adding a detection mechanism for 
corrupt state sounds good, but what exactly is causing this corruption? Is it 
our own bugs? If so, it sounds like a circular problem... :-)   

> Add hash check to transaction history in quorum servers
> -------------------------------------------------------
>
>                 Key: ZOOKEEPER-1794
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1794
>             Project: ZooKeeper
>          Issue Type: Sub-task
>          Components: quorum
>    Affects Versions: 3.5.0
>            Reporter: Germán Blanco
>            Assignee: Germán Blanco
>             Fix For: 3.6.0
>
>         Attachments: ZOOKEEPER-1794.patch, ZOOKEEPER-1794.patch, 
> ZOOKEEPER-1794.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The goal of this task is to add a hash number to each transaction in the 
> transaction history. This hash number will depend on the entire transaction 
> history. This hash number will be the same in all members of the quorum, 
> since it shall have the same result if the members have the same transaction 
> history. That means that there will be no need to send any new information 
> between members of the quorum, during the broadcast phase. The hash number 
> will be checked by the leader when learners try to connect, and it shall also 
> be sent together with the snapshot during synchronisation. If the hash number 
> does not match, the synchronisation shall be done with a snapshot in order to 
> overwrite the conflicts in the transaction history.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to