[ https://issues.apache.org/jira/browse/QPID-5409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13878595#comment-13878595 ]
ASF subversion and git services commented on QPID-5409: ------------------------------------------------------- Commit 1560333 from [~k-wall] in branch 'qpid/branches/java-broker-bdb-ha' [ https://svn.apache.org/r1560333 ] QPID-5409: Refactoring to move commit thread back to BDBMessageStore. > [Java Broker] Add support for multi-node HA cluster into BDB JE HA message > store > -------------------------------------------------------------------------------- > > Key: QPID-5409 > URL: https://issues.apache.org/jira/browse/QPID-5409 > Project: Qpid > Issue Type: Improvement > Components: Java Broker > Reporter: Alex Rudyy > Fix For: 0.27 > > > The Java Qpid Broker currently supports only 2-nodes HA cluster with BDB JE > message store. This JIRA aims to extend the current HA functionality and add > support for multi-node HA cluster into BDB JE message store. > Here is the list of high-level requirements for the HA multi-node support: > # Only persistent messages are to be replicated. Transient messages will be > lost on failover. > # System must support clusters formed of an arbitrary number of nodes > # System must continue to support clusters formed of two nodes. Existing > public interfaces (notably JMX and existing virtualhost.xml format) must be > retained (though may be deprecated) to allow existing users a convenient > upgrade path. > # System must allow a user to completely configure a new node via the > web-management interface. > # System must allow a user to monitor the nodes of the cluster in order to > ascertain its health and perform day to day operations for via the > web-management interface. > ## expose statistics to allow the health of the cluster to be established > (exact details to be determined: could be low level like DbPing or something > more abstract) > # System must be amenable to monitoring by third-party tools. The Broker > should emit clear operational log messages as it transitions between states. > These messages will be targeted at the end-user. > # System must permit a mode of operation whereby the user (or other external > agent) determines which virtual host becomes active following a store > failover. In this mode of operation, following a failure, the store nodes in > the cluster will elect a new master and the replica node will still sync-up > with the new master node, but the system will not automatically mark the > virtual host corresponding to the master as active. The user will then > transfer the master to the desired location and make the virtualhost as > active, allowing business traffic to recommence. > # System must permit a mode of operation whereby the election of a store node > as master also causes the corresponding virtual host to become active. > # System must allow a user to influence a node's electability. These > features will allow a customer whose cluster spans primary/DR sites to keep > the active virtual hosts on the primary site in normal situations by > favouring failover within the primary site.Specifically, this will allow: > ## making a node unelectable - the node, even though it remains part of the > cluster, will never be elected as master) > ## making a node more likely to be elected master than other nodes - that > is, if two or more nodes have an equally up to date set of transactions, the > node with the highest priority will be elected master > ## node electability settings must survive restart, > ## node electability settings must be alterable at run-time. > # System must allow a user to alter quorum. This feature is required in > extraordinary situations where the system is required to continue to operate > despite the loss of sufficient nodes to mean there is no longer simple > majority. > # System must allow the user to move the active virtual host from one node to > another. This feature will help a user to restore a system to its BAU state > following an extraordinary situation. > # System must provide a user with a read-only view Queue when the underlying > store is in a replica state. This must provide at least queue name and queue > depth. This feature will allow a user to be able to see that replication is > indeed functioning. > # System must allow all HA operations to be allowed/denied according to rules > in the ACL. -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org