Active-Passive-Cluster.xml

aconway Wed, 28 Mar 2012 09:25:22 -0700

Author: aconway
Date: Wed Mar 28 16:24:52 2012
New Revision: 1306454

URL: http://svn.apache.org/viewvc?rev=1306454&view=rev
Log:
QPID-3603: Update HA  documentation: example of virtual IP addresses


Modified:
    qpid/trunk/qpid/cpp/etc/cluster.conf-example.xml.in
    qpid/trunk/qpid/doc/book/src/Active-Passive-Cluster.xml

Modified: qpid/trunk/qpid/cpp/etc/cluster.conf-example.xml.in
URL: 
http://svn.apache.org/viewvc/qpid/trunk/qpid/cpp/etc/cluster.conf-example.xml.in?rev=1306454&r1=1306453&r2=1306454&view=diff
==============================================================================
--- qpid/trunk/qpid/cpp/etc/cluster.conf-example.xml.in (original)
+++ qpid/trunk/qpid/cpp/etc/cluster.conf-example.xml.in Wed Mar 28 16:24:52 2012
@@ -7,14 +7,18 @@ This example assumes a 3 node cluster, w
 <cluster name="qpid-test" config_version="18">
   <!-- The cluster has 3 nodes. Each has a unique nodid and one vote for 
quorum. -->
   <clusternodes>
-    <clusternode name="node1" nodeid="1"/>
-    <clusternode name="node2" nodeid="2"/>
-    <clusternode name="node3" nodeid="3"/>
+    <clusternode name="node1" nodeid="1">
+      <fence/>
+    </clusternode>
+    <clusternode name="node2" nodeid="2">
+      <fence/>
+    </clusternode>
+    <clusternode name="node3" nodeid="3">
+      <fence/>
+    </clusternode>
   </clusternodes>
-  <!-- Resouce Manager configuration.
-       TODO explain central_processing="1"
-  -->
-  <rm log_level="7" central_processing="1">
+  <!-- Resouce Manager configuration. -->
+  <rm log_level="7">           <!-- Verbose logging -->
     <!--
        There is a failoverdomain for each node containing just that node.
        This lets us stipulate that the qpidd service should always run on all 
nodes.
@@ -59,7 +63,7 @@ This example assumes a 3 node cluster, w
     <!-- There should always be a single qpidd-primary service, it can run on 
any node. -->
     <service name="qpidd-primary-service" autostart="1" exclusive="0" 
recovery="relocate">
       <script ref="qpidd-primary"/>
-      <!-- The primary has the IP addresses for brokers and clients. -->
+      <!-- The primary has the IP addresses for brokers and clients to 
connect. -->
       <ip ref="20.0.10.200"/>
       <ip ref="20.0.20.200"/>
     </service>

Modified: qpid/trunk/qpid/doc/book/src/Active-Passive-Cluster.xml
URL: 
http://svn.apache.org/viewvc/qpid/trunk/qpid/doc/book/src/Active-Passive-Cluster.xml?rev=1306454&r1=1306453&r2=1306454&view=diff
==============================================================================
--- qpid/trunk/qpid/doc/book/src/Active-Passive-Cluster.xml (original)
+++ qpid/trunk/qpid/doc/book/src/Active-Passive-Cluster.xml Wed Mar 28 16:24:52 
2012
@@ -13,7 +13,7 @@ http://www.apache.org/licenses/LICENSE-2
 
 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+h"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
@@ -148,15 +148,17 @@ under the License.
   <section>
     <title>Virtual IP Addresses</title>
     <para>
-      Some resource managers (including <command>rgmanager</command>) support 
<firstterm>virtual IP
-      addresses</firstterm>. A virtual IP address is an IP address that can be 
relocated to any of
-      the nodes in a cluster.  The resource manager associates this address 
with the primary node in
-      the cluster, and relocates it to the new primary when there is a 
failure. This simplifies
-      configuration as you can publish a single IP address rather than a list.
+      Some resource managers (including <command>rgmanager</command>) support
+      <firstterm>virtual IP addresses</firstterm>. A virtual IP address is an 
IP
+      address that can be relocated to any of the nodes in a cluster.  The
+      resource manager associates this address with the primary node in the
+      cluster, and relocates it to the new primary when there is a failure. 
This
+      simplifies configuration as you can publish a single IP address rather
+      than a list.
     </para>
     <para>
-      A virtual IP address can be used by clients to connect to the primary, 
and also by backup
-      brokers when they connect to the primary. The following sections will 
explain how to configure
+      A virtual IP address can be used by clients and backup brokers to connect
+      to the primary. The following sections will explain how to configure
       virtual IP addresses for clients or brokers.
     </para>
   </section>
@@ -266,42 +268,61 @@ under the License.
     <para>
       You can create replicated queues and exchanges with the 
<command>qpid-config</command>
       management tool like this:
-      <programlisting>
-       qpid-config add queue myqueue --replicate all
-      </programlisting>
     </para>
+    <programlisting>
+      qpid-config add queue myqueue --replicate all
+    </programlisting>
     <para>
       To create replicated queues and exchanges via the client API, add a 
<literal>node</literal> entry to the address like this:
-      <programlisting>
-       
"myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
-      </programlisting>
     </para>
+    <programlisting>
+      
"myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
+    </programlisting>
   </section>
 
   <section>
     <title>Client Connection and Fail-over</title>
     <para>
-      Clients can only connect to the primary broker. Backup brokers 
automatically reject any
-      connection attempt by a client.
+      Clients can only connect to the primary broker. Backup brokers
+      automatically reject any connection attempt by a client.
     </para>
     <para>
-      Clients are configured with the URL for the cluster. There are two 
possibilities
+      Clients are configured with the URL for the cluster (details below for
+      each type of client). There are two possibilities
       <itemizedlist>
-       <listitem> The URL contains multiple addresses, one for each broker in 
the cluster.</listitem>
        <listitem>
-         The URL contains a single <firstterm>virtual IP address</firstterm> 
that is assigned to the primary broker by the resource manager.
+         The URL contains multiple addresses, one for each broker in the 
cluster.
+       </listitem>
+       <listitem>
+         The URL contains a single <firstterm>virtual IP address</firstterm>
+         that is assigned to the primary broker by the resource manager.
          <footnote><para>Only if the resource manager supports virtual IP 
addresses</para></footnote>
        </listitem>
       </itemizedlist>
-      In the first case, clients will repeatedly re-try each address in the 
URL until they
-      successfully connect to the primary. In the second case the resource 
manager will assign the
-      virtual IP address to the primary broker, so clients only need to re-try 
on a single address.
-    </para>
-    <para>
-      When the primary broker fails all clients are disconnected. They go back 
to re-trying until
-      they connect to the new primary.  Any messages that have been sent by 
the client, but not yet
-      acknowledged as delivered, are resent. Similarly messages that have been 
sent by the broker,
-      but not acknowledged, are re-queued.
+      In the first case, clients will repeatedly re-try each address in the URL
+      until they successfully connect to the primary. In the second case the
+      resource manager will assign the virtual IP address to the primary 
broker,
+      so clients only need to re-try on a single address.
+    </para>
+    <para>
+      When the primary broker fails, clients re-try all known cluster addresses
+      until they connect to the new primary.  The client re-sends any messages
+      that were previously sent but not acknowledged by the broker at the time
+      of the failure.  Similarly messages that have been sent by the broker, 
but
+      not acknowledged by the client, are re-queued.
+    </para>
+    <para>
+      TCP can be slow to detect connection failures. A client can configure a
+      connection to use a <firstterm>heartbeat</firstterm> to detect connection
+      failure, and can specify a time interval for the heartbeat. If heartbeats
+      are in use, failures will be detected no later than twice the heartbeat
+      interval. The following sections explain how to enable heartbeat in each
+      client.
+    </para>
+    <para>
+      See &#34;Cluster Failover&#34; in <citetitle>Programming in Apache
+      Qpid</citetitle> for details on how to keep the client aware of cluster
+      membership.
     </para>
     <para>
       Suppose your cluster has 3 nodes: <literal>node1</literal>, 
<literal>node2</literal>
@@ -316,39 +337,57 @@ under the License.
        <footnote>
          <para>
            The full grammar for the URL is:
-           <programlisting>
-             url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
-             addr = tcp_addr / rmda_addr / ssl_addr / ...
-             tcp_addr = ["tcp:"] host [":" port]
-             rdma_addr = "rdma:" host [":" port]
-             ssl_addr = "ssl:" host [":" port]'
-           </programlisting>
          </para>
-         </footnote>. You also
-         need to specify the connection option <literal>reconnect</literal> to 
be true. For
-         example:
          <programlisting>
-           qpid::messaging::Connection 
c("node1,node2,node3","{reconnect:true}");
+           url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
+           addr = tcp_addr / rmda_addr / ssl_addr / ...
+           tcp_addr = ["tcp:"] host [":" port]
+           rdma_addr = "rdma:" host [":" port]
+           ssl_addr = "ssl:" host [":" port]'
          </programlisting>
+       </footnote>
+       You also need to specify the connection option
+       <literal>reconnect</literal> to be true.  For example:
+      </para>
+      <programlisting>
+       qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}");
+      </programlisting>
+      <para>
+       Heartbeats are disabled by default. You can enable them by specifying a
+       heartbeat interval (in seconds) for the connection via the
+       <literal>heartbeat</literal> option. For example:
+       <programlisting>
+         qpid::messaging::Connection 
c("node1,node2,node3","{reconnect:true,heartbeat:10}");
+       </programlisting>
       </para>
     </section>
     <section>
       <title>Python clients</title>
       <para>
-      With the python client, you specify <literal>reconnect=True</literal> 
and a list of
-      <replaceable>host:port</replaceable> addresses as 
<literal>reconnect_urls</literal>
-      when calling <literal>Connection.establish</literal> or 
<literal>Connection.open</literal>
+       With the python client, you specify <literal>reconnect=True</literal>
+       and a list of <replaceable>host:port</replaceable> addresses as
+       <literal>reconnect_urls</literal> when calling
+       <literal>Connection.establish</literal> or
+       <literal>Connection.open</literal>
+      </para>
       <programlisting>
        connection = qpid.messaging.Connection.establish("node1", 
reconnect=True, reconnect_urls=["node1", "node2", "node3"])
       </programlisting>
+      <para>
+       Heartbeats are disabled by default. You can
+       enable them by specifying a heartbeat interval (in seconds) for the
+       connection via the &#39;heartbeat&#39; option. For example:
       </para>
+      <programlisting>
+       connection = qpid.messaging.Connection.establish("node1", 
reconnect=True, reconnect_urls=["node1", "node2", "node3"], heartbeat=10)
+      </programlisting>
     </section>
     <section>
       <title>Java JMS Clients</title>
       <para>
-       In Java JMS clients, client fail-over is handled automatically if it is 
enabled in the
-       connection.  You can configure a connection to use fail-over using the
-       <command>failover</command> property:
+       In Java JMS clients, client fail-over is handled automatically if it is
+       enabled in the connection.  You can configure a connection to use
+       fail-over using the <command>failover</command> property:
       </para>
 
       <screen>
@@ -398,33 +437,35 @@ under the License.
       <screen>
        connectionfactory.qpidConnectionfactory = 
amqp://guest:guest@clientid/test?brokerlist=&#39;tcp://localhost:5672&#39;,idle_timeout=3
       </screen>
-
     </section>
   </section>
 
   <section>
     <title>The Cluster Resource Manager</title>
     <para>
-      Broker fail-over is managed by a <firstterm>cluster resource 
manager</firstterm>.  An
-      integration with <ulink
-      url="https://fedorahosted.org/cluster/wiki/RGManager";>rgmanager</ulink> 
is provided, but it is
-      possible to integrate with other resource managers.
+      Broker fail-over is managed by a <firstterm>cluster resource
+      manager</firstterm>.  An integration with <ulink
+      url="https://fedorahosted.org/cluster/wiki/RGManager";>rgmanager</ulink> 
is
+      provided, but it is possible to integrate with other resource managers.
     </para>
     <para>
-      The resource manager is responsible for starting an 
appropriately-configured broker on each
-      node in the cluster.  The resource manager then 
<firstterm>promotes</firstterm> one of the
-      brokers to be the primary. The other brokers connect to the primary as 
backups, using the URL
-      provided in the <literal>ha-brokers</literal> configuration option.
+      The resource manager is responsible for starting a on each node in the
+      cluster.  The resource manager then <firstterm>promotes</firstterm> one 
of
+      the brokers to be the primary. The other brokers connect to the primary 
as
+      backups, using the URL provided in the <literal>ha-brokers</literal>
+      configuration option.
     </para>
     <para>
-      Once connected, the backup brokers synchronize their state with the 
primary.  When a backup is
-      synchronized, or "hot", it is ready to take over if the primary fails.  
Backup brokers
-      continually receive updates from the primary in order to stay 
synchronized.
+      Once connected, the backup brokers synchronize their state with the
+      primary.  When a backup is synchronized, or "hot", it is ready to take
+      over if the primary fails.  Backup brokers continually receive updates
+      from the primary in order to stay synchronized.
     </para>
     <para>
-      If the primary fails, backup brokers go into fail-over mode. The 
resource manager must detect
-      the failure and promote one of the backups to be the new primary.  The 
other backups connect
-      to the new primary and synchronize their state so they can be backups 
for it.
+      If the primary fails, backup brokers go into fail-over mode. The resource
+      manager must detect the failure and promote one of the backups to be the
+      new primary.  The other backups connect to the new primary and 
synchronize
+      their state so they can be backups for it.
     </para>
     <para>
       The resource manager is also responsible for protecting the cluster from
@@ -437,65 +478,84 @@ under the License.
   <section>
     <title>Configuring <command>rgmanager</command> as resource manager</title>
     <para>
-      This section assumes that you are already familiar with setting up and 
configuring
-      clustered services using <command>cman</command> and 
<command>rgmanager</command>. It
-      will show you how to configure an active-passive, hot-standby 
<command>qpidd</command>
-      HA cluster.
+      This section assumes that you are already familiar with setting up and
+      configuring clustered services using <command>cman</command> and
+      <command>rgmanager</command>. It will show you how to configure an
+      active-passive, hot-standby <command>qpidd</command> HA cluster.
     </para>
     <para>
-      Here is an example <literal>cluster.conf</literal> file for a cluster of 
3 nodes named
-      mrg32, mrg34 and mrg35. We will go through the configuration 
step-by-step.
+      Here is an example <literal>cluster.conf</literal> file for a cluster of 
3
+      nodes named node1, node2 and node3. We will go through the configuration
+      step-by-step.
     </para>
     <programlisting>
-<![CDATA[
+      <![CDATA[
 <?xml version="1.0"?>
-<cluster alias="qpid-hot-standby" config_version="4" name="qpid-hot-standby">
+<!--
+This is an example of a cluster.conf file to run qpidd HA under rgmanager.
+This example assumes a 3 node cluster, with nodes named node1, node2 and node3.
+-->
+
+<cluster name="qpid-test" config_version="18">
+  <!-- The cluster has 3 nodes. Each has a unique nodid and one vote for 
quorum. -->
   <clusternodes>
-    <clusternode name="mrg32" nodeid="1">
+    <clusternode name="node1" nodeid="1">
       <fence/>
     </clusternode>
-    <clusternode name="mrg34" nodeid="2">
+    <clusternode name="node2" nodeid="2">
       <fence/>
     </clusternode>
-    <clusternode name="mrg35" nodeid="3">
+    <clusternode name="node3" nodeid="3">
       <fence/>
     </clusternode>
   </clusternodes>
-  <cman/>
-  <rm log_level="7"            <!-- Verbose logging -->
-      central_processing="1">  <!-- TODO explain-->
+  <!-- Resouce Manager configuration. -->
+  <rm log_level="7">           <!-- Verbose logging -->
+    <!--
+       There is a failoverdomain for each node containing just that node.
+       This lets us stipulate that the qpidd service should always run on all 
nodes.
+    -->
     <failoverdomains>
-      <failoverdomain name="mrg32-domain" restricted="1">
-       <failoverdomainnode name="mrg32"/>
+      <failoverdomain name="node1-domain" restricted="1">
+       <failoverdomainnode name="node1"/>
       </failoverdomain>
-      <failoverdomain name="mrg34-domain" restricted="1">
-       <failoverdomainnode name="mrg34"/>
+      <failoverdomain name="node2-domain" restricted="1">
+       <failoverdomainnode name="node2"/>
       </failoverdomain>
-      <failoverdomain name="mrg35-domain" restricted="1">
-       <failoverdomainnode name="mrg35"/>
+      <failoverdomain name="node3-domain" restricted="1">
+       <failoverdomainnode name="node3"/>
       </failoverdomain>
     </failoverdomains>
+
     <resources>
-      <script file="/etc/init.d/qpidd" name="qpidd"/>
-      <script file="/etc/init.d/qpidd-primary" name="qpidd-primary"/>
+      <!-- This script starts a qpidd broker acting as a backup. -->
+      <script file="!!sysconfdir!!/init.d/qpidd" name="qpidd"/>
+
+      <!-- This script promotes the qpidd broker on this node to primary. -->
+      <script file="!!sysconfdir!!/init.d/qpidd-primary" name="qpidd-primary"/>
+
+      <!-- This is a virtual IP address for broker replication traffic. -->
       <ip address="20.0.10.200" monitor_link="1"/>
+
+      <!-- This is a virtual IP address on a seprate network for client 
traffic. -->
       <ip address="20.0.20.200" monitor_link="1"/>
     </resources>
 
     <!-- There is a qpidd service on each node, it should be restarted if it 
fails. -->
-    <service name="mrg32-qpidd-service" domain="mrg32-domain" 
recovery="restart">
+    <service name="node1-qpidd-service" domain="node1-domain" 
recovery="restart">
       <script ref="qpidd"/>
     </service>
-    <service name="mrg34-qpidd-service" domain="mrg34-domain" 
recovery="restart">
+    <service name="node2-qpidd-service" domain="node2-domain" 
recovery="restart">
       <script ref="qpidd"/>
     </service>
-    <service name="mrg35-qpidd-service" domain="mrg35-domain"  
recovery="restart">
+    <service name="node3-qpidd-service" domain="node3-domain"  
recovery="restart">
       <script ref="qpidd"/>
     </service>
 
     <!-- There should always be a single qpidd-primary service, it can run on 
any node. -->
     <service name="qpidd-primary-service" autostart="1" exclusive="0" 
recovery="relocate">
       <script ref="qpidd-primary"/>
+      <!-- The primary has the IP addresses for brokers and clients to 
connect. -->
       <ip ref="20.0.10.200"/>
       <ip ref="20.0.20.200"/>
     </service>
@@ -503,7 +563,7 @@ under the License.
   <fencedevices/>
   <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
 </cluster>
-]]>
+      ]]>
     </programlisting>
     <para>
       There is a <literal>failoverdomain</literal> for each node containing 
just that
@@ -511,32 +571,48 @@ under the License.
       nodes.
     </para>
     <para>
-      The <literal>resources</literal> section defines the usual 
initialization script to
-      start the <command>qpidd</command> service.  <command>qpidd</command>. 
It also
-      defines the <command>qpid-primary</command> script. Starting this script 
does not
+      The <literal>resources</literal> section defines the usual initialization
+      script to start the <command>qpidd</command> service.
+      <command>qpidd</command>. It also defines the
+      <command>qpid-primary</command> script. Starting this script does not
       actually start a new service, rather it promotes the existing
       <command>qpidd</command> broker to primary status.
     </para>
     <para>
       The <literal>resources</literal> section also defines a pair of virtual 
IP
       addresses on different sub-nets. One will be used for broker-to-broker
-      communication, the other for client-to-broker.
+      communication, the other for client-to-broker. 
+    </para>
+    <para>
+      To take advantage of the virtual IP addresses, 
<filename>qpidd.conf</filename>
+      should contain these  lines:
+    </para>
+    <programlisting>
+      ha-cluster=yes
+      ha-brokers=20.0.20.200
+      ha-public-brokers=20.0.10.200
+    </programlisting>
+    <para>
+      This configuration specifies that backup brokers will use 20.0.20.200
+      to connect to the primary and will advertise 20.0.10.200 to clients.
+      Clients should connect to 20.0.10.200.
     </para>
     <para>
-      The <literal>service</literal> section defines 3 
<command>qpidd</command> services,
-      one for each node. Each service is in a restricted fail-over domain 
containing just
-      that node, and has the <literal>restart</literal> recovery policy. The 
effect of
-      this is that rgmanager will run <command>qpidd</command> on each node, 
restarting if
-      it fails.
+      The <literal>service</literal> section defines 3 <command>qpidd</command>
+      services, one for each node. Each service is in a restricted fail-over
+      domain containing just that node, and has the <literal>restart</literal>
+      recovery policy. The effect of this is that rgmanager will run
+      <command>qpidd</command> on each node, restarting if it fails.
     </para>
     <para>
       There is a single <literal>qpidd-primary-service</literal> running the
-      <command>qpidd-primary</command> script which is not restricted to a 
domain and has
-      the <literal>relocate</literal> recovery policy. This means rgmanager 
will start
-      <command>qpidd-primary</command> on one of the nodes when the cluster 
starts and
-      will relocate it to another node if the original node fails. Running the
-      <literal>qpidd-primary</literal> script does not actually start a new 
process,
-      rather it promotes the existing broker to become the primary.
+      <command>qpidd-primary</command> script which is not restricted to a
+      domain and has the <literal>relocate</literal> recovery policy. This 
means
+      rgmanager will start <command>qpidd-primary</command> on one of the nodes
+      when the cluster starts and will relocate it to another node if the
+      original node fails. Running the <literal>qpidd-primary</literal> script
+      does not start a new broker process, it promotes the existing broker to
+      become the primary.
     </para>
   </section>
 
@@ -556,14 +632,6 @@ under the License.
       <command>qpid-stat</command> will connect to a backup if you pass the 
flag <command>--ha-admin</command> on the
       command line.
     </para>
-    <para>
-      To promote a broker to primary use the following command:
-      <programlisting>
-       qpid-ha promote -b 
<replaceable>host</replaceable>:<replaceable>port</replaceable>
-      </programlisting>
-      The resource manager must ensure that it does not promote a broker to 
primary when
-      there is already a primary in the cluster.
-    </para>
   </section>
 </section>
 



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

svn commit: r1306454 - in /qpid/trunk/qpid: cpp/etc/cluster.conf-example.xml.in doc/book/src/Active-Passive-Cluster.xml

Reply via email to