[ 
https://issues.apache.org/jira/browse/ARTEMIS-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2710:
-------------------------------------
    Description: 
During a master restart, the slave's stop can make master to request a quorum 
vote to failback, while it shouldn't happen.
The bug is happening because master can receive a DISCONNECT events coming from 
the slave that could be received before STOP_CALLED depending which connection 
will receive it first.

The order of events from the point of view of slave is:
* the slave (now live) is going to stop to allow master to failback
* the slave async send a STOP_CALLED on the connection used to replicate files 
to master (let's call this backup transport connection)
* the slave close all the connections, but the backup transport connection, 
async sending a DISCONNECT on each connection
* the slave async send a FAIL_OVER on the backup transport connection
* the slave await 5 seconds before closing the backup transport connection too

An unlucky timing can make the DISCONNECT event to be processed before 
STOP_CALLED

  was:
During a master restart, the slave's stop can make master to request a quorum 
vote to failback, while it shouldn't happen.
The bug is happening because master can receive a DISCONNECT events coming from 
the slave that could be received before STOP_CALLED depending which connection 
will receive it first.

The order of events from the point of view of slave is:
* the slave (now live) is going to stop to allow master to failback
* the slave async send a STOP_CALLED on the connection used to replicate files 
to master (let's call this backup transport connection)
* the slave close all the connections, but the backup transport connection, 
async sending a DISCONNECT on each connection
* the slave async send a FAIL_OVER on the backup transport connection
* the slave await 5 seconds before closing the backup transport connection too

Un unlucky timing can make the DISCONNECT event to be processed before 
STOP_CALLED


> Master failback don't need quorum vote while slave in stopping
> --------------------------------------------------------------
>
>                 Key: ARTEMIS-2710
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2710
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>            Reporter: Francesco Nigro
>            Priority: Minor
>
> During a master restart, the slave's stop can make master to request a quorum 
> vote to failback, while it shouldn't happen.
> The bug is happening because master can receive a DISCONNECT events coming 
> from the slave that could be received before STOP_CALLED depending which 
> connection will receive it first.
> The order of events from the point of view of slave is:
> * the slave (now live) is going to stop to allow master to failback
> * the slave async send a STOP_CALLED on the connection used to replicate 
> files to master (let's call this backup transport connection)
> * the slave close all the connections, but the backup transport connection, 
> async sending a DISCONNECT on each connection
> * the slave async send a FAIL_OVER on the backup transport connection
> * the slave await 5 seconds before closing the backup transport connection too
> An unlucky timing can make the DISCONNECT event to be processed before 
> STOP_CALLED



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to