[
https://issues.apache.org/jira/browse/QPID-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769257#action_12769257
]
Ken Giusti commented on QPID-2157:
----------------------------------
Looks good. Minor question/notes (remember, I'm a cluster noob, so feel free
to reject if common knowledge makes these obvious) - thx.
Section Tests:, last bullet item:
>start 2 clusters, shut down. Attempt to re-start members as a single cluster.
>Unrelated stores detected and cluster re-start exits with error.
Section Design:
? Should we always assume that --cluster-store-count also indicates the
minimum number of cluster members that should be waited for before the cluster
selects persistent state? For example, if I have 5 persistent members, must
all 5 be available before a store is selected and propagated? Would it be
useful to recover state if, say, 3 persistent members become ready, all have
matching clean store, and timeout waiting for the remaining two? Should we
provide an optional --cluster-store-min? If not, what is the behavior on
timeout if < --cluster-store-count members are available?
Section Startup:
>If no member has a clean store: member with highest frame-sequence loads store
>and provides updates.
? Would it be safer to require manual intervention in this case as the default
behaviour?
? The following statement (under stored state):
Orderly shutdown means qpid-cluster -k, or any shutdown for last [persistent]
member cluster.
implies that cluster members can shutdown at different times - some later than
others. The tests seem to imply this, too, and any later state recovered.
If true, then:
>If some members have a clean store:
> * compare stored state, if any mismatch then manual intervention required
Shouldn't a mismatch on just the frame seq # still be allowed, selecting the
"oldest" sequence number as the store to recover?
thx,
-K
> Persistent cluster restart
> --------------------------
>
> Key: QPID-2157
> URL: https://issues.apache.org/jira/browse/QPID-2157
> Project: Qpid
> Issue Type: Bug
> Components: C++ Broker
> Affects Versions: 0.5
> Reporter: Alan Conway
> Assignee: Alan Conway
>
> Currently, when restarting a persistent cluster, the first broker to start
> loads from its store and all other brokers move their store aside and update
> from the cluster. If some brokers failed and have out-of-date stores, we
> assume manual intervention to ensure that the correct broker is started first.
> The goal is to have the brokers automatically compare their stores, allowing
> all brokers with clean stores to load from store and all other brokers to
> update from the cluster.
> A design note for this issue is at
> http://cwiki.apache.org/confluence/display/qpid/Persistent+Cluster+Restart+Design+Note
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:[email protected]