[ 
https://issues.apache.org/jira/browse/QPID-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262897#comment-13262897
 ] 

jirapos...@reviews.apache.org commented on QPID-3963:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4846/
-----------------------------------------------------------

(Updated 2012-04-26 19:19:31.447871)


Review request for qpid, Alan Conway and Gordon Sim.


Changes
-------

This patch should be final - I've got more testing to do and that might result 
in some changes, but consider this my solution for QPID-3963.

I also will be adding a unit test to verify the fix, but that is TBD.

In summary:

1) Each broker Link attempts to subscribe to the amq.failover exchange on the 
remote
2) The set of failover URLs learned from the remote are replicated on the local 
cluster when a new member is added.

I've also tried to apply most of the comments from the last review.

Thanks, -K


Summary
-------

Still a WIP, but I wanted early feedback as I'm not too experienced with the 
subscription management code involved (completely stolen from Alan).

This patch allows the Link to subscribe to the remote broker's amq.failover 
exchange - if it exists.  This allows the Link to be updated dynamically should 
the remote broker be part of a cluster, and the cluster membership changes.

Light testing against a cluster confirms that this patch fixes qpid-3963.  
Testing against a non-cluster remote causes the remote to log the following 
error, but otherwise behaves ok:

2012-04-23 16:45:27 error Execution exception: not-found: Exchange not found: 
amq.failover (../../../qpid/cpp/src/qpid/broker/ExchangeRegistry.cpp:101)


This addresses bug qpid-3963.
    https://issues.apache.org/jira/browse/qpid-3963


Diffs (updated)
-----

  /trunk/qpid/cpp/xml/cluster.xml 1329301 
  /trunk/qpid/cpp/src/qpid/cluster/Connection.h 1329301 
  /trunk/qpid/cpp/src/qpid/cluster/Connection.cpp 1329301 
  /trunk/qpid/cpp/src/qpid/cluster/UpdateClient.cpp 1329301 
  /trunk/qpid/cpp/src/qpid/broker/LinkRegistry.cpp 1329301 
  /trunk/qpid/cpp/src/qpid/broker/Link.cpp 1329301 
  /trunk/qpid/cpp/src/qpid/broker/ExchangeRegistry.cpp 1329301 
  /trunk/qpid/cpp/src/qpid/broker/Link.h 1329301 

Diff: https://reviews.apache.org/r/4846/diff


Testing
-------

minimal.


Thanks,

Kenneth


                
> A federated broker may not reconnect to a remote cluster on link failure.
> -------------------------------------------------------------------------
>
>                 Key: QPID-3963
>                 URL: https://issues.apache.org/jira/browse/QPID-3963
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker, C++ Clustering
>    Affects Versions: 0.14
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>
> When a broker is federated with a cluster, the cluster informs the broker of 
> the failover addresses that are valid for the cluster.  Should a cluster 
> member fail, the broker will reconnect to another member of that cluster.
> However, the federated broker only queries the cluster for these failover 
> addresses when it first connects to the cluster.  Should the cluster topology 
> change, the federated broker's list of available failover addresses will 
> become out-of-date.  This can prevent the broker from correctly re-connecting 
> on failure of a cluster member.
> Example:
> Given cluster with members C1 and C2, and a separate broker B, federate B to 
> connect to C1.   On connecting to C1, B learns the addresses of C2 as an 
> alternate failover address.  Now shutdown C1.  B will reconnect to C2, and 
> learn that C2 is the only member of the cluster (ie. no failover addresses).  
>  After B connects, restart C1 and let it join the cluster.  Then shutdown C2. 
>   Since B does not know that C1 has become available again, B will not 
> attempt to re-connect to it.  Instead, it tries to reconnect to C2 
> indefinately.
> The expected behavior would be to have B reconnect to C1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to