[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968485#comment-13968485
 ] 

Germán Blanco commented on ZOOKEEPER-1794:
------------------------------------------

This (recovering from an existing snapshot in the cluster) is the best behavior 
for the use case that I had in hands, but it might not be the best in all cases.
So, from this discussion it seems that the goals are:
- enable the best possible tracing of the problem -> In order to find the root 
of the problem, we could raise an alarm and log as much information as possible.
- ensure consistency between the servers and also with the clients -> If this 
happens servers need to "touch base" and at the very least, clients need to 
lose their session.
- maintain the service up and running -> Optionally (an option on top of the 
option to have this verification at all), the system could either stop, start 
with 0 information or recover from an available snapshot.

How does that sound?

Just to clarify where I come from, in the use case that I used we had a part of 
static information that basically never changed and some short-lived 
information that changed frequently and didn't have that much value if it was 
older than say 5 minutes. Recovering from an snapshot created the static 
information, and when clients lost their session and reconnected, they 
refreshed their state and everything continued working normally.

> Add hash check to transaction history in quorum servers
> -------------------------------------------------------
>
>                 Key: ZOOKEEPER-1794
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1794
>             Project: ZooKeeper
>          Issue Type: Sub-task
>          Components: quorum
>    Affects Versions: 3.5.0
>            Reporter: Germán Blanco
>            Assignee: Germán Blanco
>             Fix For: 3.6.0
>
>         Attachments: ZOOKEEPER-1794.patch, ZOOKEEPER-1794.patch, 
> ZOOKEEPER-1794.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The goal of this task is to add a hash number to each transaction in the 
> transaction history. This hash number will depend on the entire transaction 
> history. This hash number will be the same in all members of the quorum, 
> since it shall have the same result if the members have the same transaction 
> history. That means that there will be no need to send any new information 
> between members of the quorum, during the broadcast phase. The hash number 
> will be checked by the leader when learners try to connect, and it shall also 
> be sent together with the snapshot during synchronisation. If the hash number 
> does not match, the synchronisation shall be done with a snapshot in order to 
> overwrite the conflicts in the transaction history.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to