[
https://issues.apache.org/jira/browse/ARTEMIS-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Francesco Nigro updated ARTEMIS-2710:
-------------------------------------
Description:
During a master restart, the slave's stop can make master to request a quorum
vote to failback, while it shouldn't happen.
The bug is happening because master can receive a DISCONNECT events coming from
the slave that could be received before STOP_CALLED depending which connection
will receive it first.
The order of events from the point of view of slave is:
* the slave (now live) is going to stop to allow master to failback
* the slave async send a STOP_CALLED on the connection used to replicate files
to master (let's call this backup transport connection)
* the slave close all the connections, but the backup transport connection,
async sending a DISCONNECT on each connection
* the slave async send a FAIL_OVER on the backup transport connection
* the slave await 5 seconds before closing the backup transport connection too
Un unlucky timing can make the DISCONNECT event to be processed before
STOP_CALLED
was:
During a master restart, the backup's stop can make master to request a
quorum vote to failback, while it shouldn't happen.
The bug is happening because master can listen to DISCONNECT events coming from
backup that could be received before STOP_CALLED depending which connections
are established between the twos.
> Master failback don't need quorum vote while slave in stopping
> --------------------------------------------------------------
>
> Key: ARTEMIS-2710
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2710
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Reporter: Francesco Nigro
> Priority: Minor
>
> During a master restart, the slave's stop can make master to request a quorum
> vote to failback, while it shouldn't happen.
> The bug is happening because master can receive a DISCONNECT events coming
> from the slave that could be received before STOP_CALLED depending which
> connection will receive it first.
> The order of events from the point of view of slave is:
> * the slave (now live) is going to stop to allow master to failback
> * the slave async send a STOP_CALLED on the connection used to replicate
> files to master (let's call this backup transport connection)
> * the slave close all the connections, but the backup transport connection,
> async sending a DISCONNECT on each connection
> * the slave async send a FAIL_OVER on the backup transport connection
> * the slave await 5 seconds before closing the backup transport connection too
> Un unlucky timing can make the DISCONNECT event to be processed before
> STOP_CALLED
--
This message was sent by Atlassian Jira
(v8.3.4#803005)