[
https://issues.apache.org/jira/browse/ZOOKEEPER-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968485#comment-13968485
]
Germán Blanco commented on ZOOKEEPER-1794:
------------------------------------------
This (recovering from an existing snapshot in the cluster) is the best behavior
for the use case that I had in hands, but it might not be the best in all cases.
So, from this discussion it seems that the goals are:
- enable the best possible tracing of the problem -> In order to find the root
of the problem, we could raise an alarm and log as much information as possible.
- ensure consistency between the servers and also with the clients -> If this
happens servers need to "touch base" and at the very least, clients need to
lose their session.
- maintain the service up and running -> Optionally (an option on top of the
option to have this verification at all), the system could either stop, start
with 0 information or recover from an available snapshot.
How does that sound?
Just to clarify where I come from, in the use case that I used we had a part of
static information that basically never changed and some short-lived
information that changed frequently and didn't have that much value if it was
older than say 5 minutes. Recovering from an snapshot created the static
information, and when clients lost their session and reconnected, they
refreshed their state and everything continued working normally.
> Add hash check to transaction history in quorum servers
> -------------------------------------------------------
>
> Key: ZOOKEEPER-1794
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1794
> Project: ZooKeeper
> Issue Type: Sub-task
> Components: quorum
> Affects Versions: 3.5.0
> Reporter: Germán Blanco
> Assignee: Germán Blanco
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-1794.patch, ZOOKEEPER-1794.patch,
> ZOOKEEPER-1794.patch
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> The goal of this task is to add a hash number to each transaction in the
> transaction history. This hash number will depend on the entire transaction
> history. This hash number will be the same in all members of the quorum,
> since it shall have the same result if the members have the same transaction
> history. That means that there will be no need to send any new information
> between members of the quorum, during the broadcast phase. The hash number
> will be checked by the leader when learners try to connect, and it shall also
> be sent together with the snapshot during synchronisation. If the hash number
> does not match, the synchronisation shall be done with a snapshot in order to
> overwrite the conflicts in the transaction history.
--
This message was sent by Atlassian JIRA
(v6.2#6252)