Alex Rudyy created QPID-5409:
--------------------------------
Summary: [Java Broker] Add support for multi-node HA cluster into
BDB JE HA message store
Key: QPID-5409
URL: https://issues.apache.org/jira/browse/QPID-5409
Project: Qpid
Issue Type: Improvement
Components: Java Broker
Reporter: Alex Rudyy
Fix For: 0.27
The Java Qpid Broker currently supports only 2-nodes HA cluster with BDB JE
message store. This JIRA aims to extend the current HA functionality and add
support for multi-node HA cluster into BDB JE message store.
Here is the list of high-level requirements for the HA multi-node support:
# Only persistent messages are to be replicated. Transient messages will be
lost on failover.
# System must support clusters formed of an arbitrary number of nodes
# System must continue to support clusters formed of two nodes. Existing
public interfaces (notably JMX and existing virtualhost.xml format) must be
retained (though may be deprecated) to allow existing users a convenient
upgrade path.
# System must allow a user to completely configure a new node via the
web-management interface.
# System must allow a user to monitor the nodes of the cluster in order to
ascertain its health and perform day to day operations for via the
web-management interface.
## expose statistics to allow the health of the cluster to be established
(exact details to be determined: could be low level like DbPing or something
more abstract)
# System must be amenable to monitoring by third-party tools. The Broker should
emit clear operational log messages as it transitions between states. These
messages will be targeted at the end-user.
# System must permit a mode of operation whereby the user (or other external
agent) determines which virtual host becomes active following a store failover.
In this mode of operation, following a failure, the store nodes in the cluster
will elect a new master and the replica node will still sync-up with the new
master node, but the system will not automatically mark the virtual host
corresponding to the master as active. The user will then transfer the master
to the desired location and make the virtualhost as active, allowing business
traffic to recommence.
# System must permit a mode of operation whereby the election of a store node
as master also causes the corresponding virtual host to become active.
# System must allow a user to influence a node's electability. These features
will allow a customer whose cluster spans primary/DR sites to keep the active
virtual hosts on the primary site in normal situations by favouring failover
within the primary site.Specifically, this will allow:
## making a node unelectable - the node, even though it remains part of the
cluster, will never be elected as master)
## making a node more likely to be elected master than other nodes - that is,
if two or more nodes have an equally up to date set of transactions, the node
with the highest priority will be elected master
## node electability settings must survive restart,
## node electability settings must be alterable at run-time.
# System must allow a user to alter quorum. This feature is required in
extraordinary situations where the system is required to continue to operate
despite the loss of sufficient nodes to mean there is no longer simple majority.
# System must allow the user to move the active virtual host from one node to
another. This feature will help a user to restore a system to its BAU state
following an extraordinary situation.
# System must provide a user with a read-only view Queue when the underlying
store is in a replica state. This must provide at least queue name and queue
depth. This feature will allow a user to be able to see that replication is
indeed functioning.
# System must allow all HA operations to be allowed/denied according to rules
in the ACL.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]