Pascal Garcia created ARTEMIS-2609:
--------------------------------------
Summary: Ha-policy collocated not working.
Key: ARTEMIS-2609
URL: https://issues.apache.org/jira/browse/ARTEMIS-2609
Project: ActiveMQ Artemis
Issue Type: Bug
Components: AMQP
Affects Versions: 2.11.0
Reporter: Pascal Garcia
I use Artemis 2.11.0 (the latest at the time). It affects also the earlier
versions I have tested.
I have setup a cluster of 3 servers. Extract of the broker.xml configuration of
the first server. Configurations of the other servers are symmetric.
{code:xml}
<connectors>
<!-- Connector used to be announced through cluster connections and
notifications -->
<connector name="cluster-connector">tcp://f3slsea387:61616</connector>
<connector name="f3slsea389">tcp://f3slsea389:61616</connector>
<connector name="f3slsea388">tcp://f3slsea388:61616</connector>
</connectors>
{code}
...
{code:xml}
<cluster-connections>
<cluster-connection name="my-cluster">
<address></address>
<connector-ref>cluster-connector</connector-ref>
<check-period>1000</check-period>
<connection-ttl>5000</connection-ttl>
<call-timeout>5000</call-timeout>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<!-- <static-connectors allow-direct-connections-only="true"> -->
<static-connectors>
<connector-ref>f3slsea389</connector-ref>
<connector-ref>f3slsea388</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
{code}
This works fine:
A consumer, whatever the server it is connected on, and whatever the server the
producer has delivered the message on, consumes all the messages.
Now comes the ha-policy.
{code:xml}
<ha-policy>
<replication>
<colocated>
<request-backup>true</request-backup>
<max-backups>3</max-backups>
<backup-request-retries>-1</backup-request-retries>
<backup-request-retry-interval>5000</backup-request-retry-interval/>
<backup-port-offset>10</backup-port-offset>
<failover-on-shutdown>true</failover-on-shutdown>
<excludes>
<connector-ref>cluster-connector</connector-ref>
</excludes>
<master>
<check-for-live-server>true</check-for-live-server>
<initial-replication-sync-timeout>30000</initial-replication-sync-timeout>
</master>
<slave>
<allow-failback>true</allow-failback>
<restart-backup>true</restart-backup>
<initial-replication-sync-timeout>30000</initial-replication-sync-timeout>
<!--
<max-saved-replicated-journals-size>10</max-saved-replicated-journals-size> -->
</slave>
</colocated>
</replication>
</ha-policy>
{code}
With this policy, I expect that every server is backup on another server. This
seems to work as I find backup journal on the different servers.
When a server fails, I also expect that the other server backing up the server
failing handle the messages backup and make these messages available for one of
consumers connected on one of the remaining servers. But this does not work.
The messages are not consumed.
However, the messages are not lost and as soon as the failing server restarts,
the messages are consumed.
Note I do not use scale-down. With scale down the messages are delivered but,
when the failed server comes up again the messages are delivered a second time
what is the expected behavior but not suitable in my case.
Note also that I never saw one of the server listening on an different port
then the ones configured in the acceptors, and therefore I do not understand
what backup-port-offset is mend for.
What is wrong of missing in this configuration to have failures properly
handled?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)