[jira] Commented: (QPID-2157) Persistent cluster restart

Ken Giusti (JIRA) Fri, 23 Oct 2009 08:29:23 -0700

    [ 
https://issues.apache.org/jira/browse/QPID-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769257#action_12769257
 ]


Ken Giusti commented on QPID-2157:
----------------------------------

Looks good.  Minor question/notes  (remember, I'm a cluster noob, so feel free 
to reject if common knowledge makes these obvious) - thx.

Section Tests:, last bullet item:

>start 2 clusters, shut down. Attempt to re-start members as a single cluster. 
>Unrelated stores detected and cluster re-start exits with error.

Section Design:

?  Should we always assume that --cluster-store-count also indicates the 
minimum number of cluster members that should be waited for before  the cluster 
selects persistent state?   For example, if I have 5 persistent members, must 
all 5 be available before a store is selected and propagated?    Would it be 
useful to recover state if, say, 3 persistent members become ready, all have 
matching clean store, and timeout waiting for the remaining two?   Should we 
provide an optional --cluster-store-min?  If not, what is the behavior on 
timeout if < --cluster-store-count members are available?

Section Startup:

>If no member has a clean store: member with highest frame-sequence loads store 
>and provides updates.

? Would it be safer to require manual intervention in this case as the default 
behaviour?

? The following statement (under stored state):

Orderly shutdown means qpid-cluster -k, or any shutdown for last [persistent] 
member cluster.

implies that cluster members can shutdown at different times - some later than 
others.   The tests seem to imply this, too, and any later state recovered.   
If true, then:

>If some members have a clean store:
>    * compare stored state, if any mismatch then manual intervention required


Shouldn't a mismatch on just the frame seq # still be allowed,  selecting the 
"oldest" sequence number as the store to recover?

thx,

-K    

> Persistent cluster restart
> --------------------------
>
>                 Key: QPID-2157
>                 URL: https://issues.apache.org/jira/browse/QPID-2157
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker
>    Affects Versions: 0.5
>            Reporter: Alan Conway
>            Assignee: Alan Conway
>
> Currently, when restarting a persistent cluster, the first broker to start 
> loads from its store and all other brokers move their store aside and update 
> from the cluster.  If some brokers failed and have out-of-date stores, we 
> assume manual intervention to ensure that the correct broker is started first.
> The goal is to have the brokers automatically compare their stores, allowing 
> all brokers with clean stores to load from store and all other brokers to 
> update from the cluster.
> A design note for this issue is at 
> http://cwiki.apache.org/confluence/display/qpid/Persistent+Cluster+Restart+Design+Note

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

[jira] Commented: (QPID-2157) Persistent cluster restart

Reply via email to