[jira] [Comment Edited] (ARTEMIS-1285) Standby slave would not announce replication to master when the slave is down

Catalin Alexandru Zamfir (JIRA) Tue, 17 Apr 2018 13:22:38 -0700

    [ 
https://issues.apache.org/jira/browse/ARTEMIS-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441434#comment-16441434
 ]


Catalin Alexandru Zamfir edited comment on ARTEMIS-1285 at 4/17/18 8:21 PM:
----------------------------------------------------------------------------

The example is simple. Our set-up involves jgroups TCPPING discovery with 
"initial_hosts" set to the live nodes only and in order to avoid Jgroups 'no 
physical address for node: UUID" we have set "send_cache_on_join" and 
"return_entire_cache" to true on the TCPPING setup in jgroups.

Below is the master/slave configurations (identical for the backups).
{code:java}
... Jgroups TCPPING discovery/broadcast configuration above ...

Working cluster, tested with ./artemis producer/consumer CLI commands from 
different nodes on different physical machines.

... on master (live)
<ha-policy>
<replication>
<master>
<check-for-live-server>true</check-for-live-server>
<group-name>g1</group-name>
<initial-replication-sync-timeout>15000</initial-replication-sync-timeout>
<cluster-name>shared-artemis-cluster</cluster-name>
<vote-on-replication-failure>true</vote-on-replication-failure>
</master>
</replication>
</ha-policy>

... on replicas (2x)
<ha-policy>
<replication>
<slave>
<allow-failback>true</allow-failback>
<group-name>g1</group-name>
<initial-replication-sync-timeout>15000</initial-replication-sync-timeout>
<cluster-name>shared-artemis-cluster</cluster-name>
<vote-retries>12</vote-retries>
<vote-retry-wait>5000</vote-retry-wait>
</slave>
</replication>
</ha-policy>{code}
Note that I repeated the fresh install several times (using Ansible, docker and 
fresh LVs on LVM, everything is purged and reinstalled). Every fresh install, 
"r3" enters the loop. But any manual intervention (eg.restart of r3) makes it 
behave normally (staying in standby state until some manual stop of the master 
is done, at which point it becomes backup for r2).

Looks like some sort of cluster "intial state" conflict. Maybe related to 
TCPPING + jgroups in this setup. Sadly I can't use multicast (UDP) in our 
network to provide a different behaviour for comparison. I'm on all 3 hawtio 
management consoles when installing the fresh cluster. One reports live, the 
other bacup, the 3rd gives exceptions 'Broker is stopped' when trying to view 
any attributes.

It's late for me, taking this for a spin tomorrow. If I can provide any more 
information, please ask. Thanks!


was (Author: antauri):
The example is simple. Our set-up involves jgroups TCPPING discovery with 
"initial_hosts" set to the live nodes only and in order to avoid Jgroups 'no 
physical address for node: UUID" we have set "send_cache_on_join" and 
"return_entire_cache" to true on the TCPPING setup in jgroups.

Below is the master/slave configurations (identical for the backups).
{code:java}
... Jgroups TCPPING discovery/broadcast configuration above ...

Working cluster, tested with ./artemis producer/consumer CLI commands from 
different nodes on different physical machines.

... on master (live)
<ha-policy>
<replication>
<master>
<check-for-live-server>true</check-for-live-server>
<group-name>g1</group-name>
<initial-replication-sync-timeout>15000</initial-replication-sync-timeout>
<cluster-name>shared-artemis-cluster</cluster-name>
<vote-on-replication-failure>true</vote-on-replication-failure>
</master>
</replication>
</ha-policy>

... on replicas (2x)
<ha-policy>
<replication>
<slave>
<allow-failback>true</allow-failback>
<group-name>g1</group-name>
<initial-replication-sync-timeout>15000</initial-replication-sync-timeout>
<cluster-name>shared-artemis-cluster</cluster-name>
<vote-retries>12</vote-retries>
<vote-retry-wait>5000</vote-retry-wait>
</slave>
</replication>
</ha-policy>{code}

> Standby slave would not announce replication to master when the slave is down
> -----------------------------------------------------------------------------
>
>                 Key: ARTEMIS-1285
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-1285
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.1.0
>            Reporter: yangwei
>            Priority: Major
>
> We have a cluster of 3 instances: A is master, B is slave and C is standby 
> slave. When slave is down, we expect C announces replication to A but A is in 
> standalone mode all the time. We see C waits at "nodeLocator.locateNode()" 
> through jstack command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (ARTEMIS-1285) Standby slave would not announce replication to master when the slave is down

Reply via email to