Pascal Garcia created ARTEMIS-2609:
--------------------------------------

             Summary: Ha-policy collocated not working.
                 Key: ARTEMIS-2609
                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2609
             Project: ActiveMQ Artemis
          Issue Type: Bug
          Components: AMQP
    Affects Versions: 2.11.0
            Reporter: Pascal Garcia


I use Artemis 2.11.0 (the latest at the time). It affects also the earlier 
versions I have tested.

I have setup a cluster of 3 servers. Extract of the broker.xml configuration of 
the first server. Configurations of the other servers are symmetric.
{code:xml}
     <connectors>
        <!-- Connector used to be announced through cluster connections and 
notifications -->        
        <connector name="cluster-connector">tcp://f3slsea387:61616</connector>
        <connector name="f3slsea389">tcp://f3slsea389:61616</connector>
        <connector name="f3slsea388">tcp://f3slsea388:61616</connector>
      </connectors>
 {code}
...
{code:xml}
      <cluster-connections>
          <cluster-connection name="my-cluster">
               <address></address>
               <connector-ref>cluster-connector</connector-ref>
               <check-period>1000</check-period>
              <connection-ttl>5000</connection-ttl>
              <call-timeout>5000</call-timeout>
              <retry-interval>500</retry-interval>
              <use-duplicate-detection>true</use-duplicate-detection>
              <message-load-balancing>ON_DEMAND</message-load-balancing>
              <max-hops>1</max-hops>
              <!-- <static-connectors allow-direct-connections-only="true"> -->
              <static-connectors>
                  <connector-ref>f3slsea389</connector-ref>
                  <connector-ref>f3slsea388</connector-ref>
              </static-connectors>
         </cluster-connection>
      </cluster-connections>
 {code}
This works fine:

A consumer, whatever the server it is connected on, and whatever the server the 
producer has delivered the message on, consumes all the messages.

Now comes the ha-policy.
{code:xml}
      <ha-policy>
         <replication>
            <colocated>
                <request-backup>true</request-backup>
                <max-backups>3</max-backups>
                <backup-request-retries>-1</backup-request-retries>
                
<backup-request-retry-interval>5000</backup-request-retry-interval/>
                <backup-port-offset>10</backup-port-offset>
                <failover-on-shutdown>true</failover-on-shutdown>
                <excludes>
                    <connector-ref>cluster-connector</connector-ref>
                </excludes>
                <master>
                    <check-for-live-server>true</check-for-live-server>
                    
<initial-replication-sync-timeout>30000</initial-replication-sync-timeout>
                </master>
                <slave>
                    <allow-failback>true</allow-failback>
                    <restart-backup>true</restart-backup>
                    
<initial-replication-sync-timeout>30000</initial-replication-sync-timeout>
                    <!-- 
<max-saved-replicated-journals-size>10</max-saved-replicated-journals-size> -->
                </slave>
            </colocated>
         </replication>
      </ha-policy>
{code}
With this policy, I expect that every server is backup on another server. This 
seems to work as I find backup journal on the different servers.

When a server fails, I also expect that the other server backing up the server 
failing handle the messages backup and make these messages available for one of 
consumers connected on one of the remaining servers. But this does not work. 
The messages are not consumed.

However, the messages are not lost and as soon as the failing server restarts, 
the messages are consumed.

Note I do not use scale-down. With scale down the messages are delivered but, 
when the failed server comes up again the messages are delivered a second time 
what is the expected behavior but not suitable in my case.

Note also that I never saw one of the server listening on an different port 
then the ones configured in the acceptors, and therefore I do not understand 
what backup-port-offset is mend for.

What is wrong of missing in this configuration to have failures properly 
handled?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to