[ 
https://issues.apache.org/jira/browse/AMQ-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Serrano updated AMQ-3364:
--------------------------------

    Description: 
I see this problem consistently when a producer is continuously sending 
messages and the master is shutdown in a controlled fashion.  When the master 
broker is undergoing a controlled shutdown, the BrokerService.stop() method 
stops things in this order: 

* services 
* connectors 
* registered vm transports 
* broker 

So there is a period where the broker will still process sends after other 
(apparently necessary) facilities have been shutdown.  I have not followed the 
code paths to understand exactly what goes wrong, but I traced enough to tell 
that messages sent in this interval can disappear.  That is, the client send 
call will return without error but after failover the slave will not replay the 
message.  

This appears to only be an issue during a controlled shutdown.  Process death 
should not cause this problem. 

I'm currently working around this by having the BrokerService set a stopping 
flag and having the MasterBroker check this flag and reject sends (with a new 
exception class) if true.  My client code then detects this case and just 
retries until the failover is complete.  It seems like there should be a better 
and more integrated solution that does not require the client code to handle 
this but is handled inside of the FailoverTransport code on the client's behalf.

  was:
I see this problem consistently when a producer is continuously sending 
messages and the master is shutdown in a controlled fashion.  When the master 
broker is undergoing a controlled shutdown, the BrokerService.stop() method 
stops things in this order: 

* services 
* connectors 
* registered vm transports 
* broker 

So there is a period where the broker will still process sends after other 
(apparently necessary) facilities have been shutdown.  I have not followed the 
code paths to understand exactly what goes wrong, but I traced enough to tell 
that messages sent in this interval can disappear.  That is, the client send 
call will return without error but after failover the slave will not replay the 
message.  

This appears to only be an issue during a controlled shutdown.  Process death 
should not cause this problem. 

I'm currently working around this by having the BrokerService set a stopping 
flag and having the MasterBroker check this flag and reject sends (with a new 
exception class) if true.  My client code then detects this case and just 
retries until the failover is complete.


> Broker can lose messages during master/slave failover when master undergoes a 
> controlled shutdown
> -------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-3364
>                 URL: https://issues.apache.org/jira/browse/AMQ-3364
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.4.2, 5.5.0
>            Reporter: Martin Serrano
>            Priority: Critical
>
> I see this problem consistently when a producer is continuously sending 
> messages and the master is shutdown in a controlled fashion.  When the master 
> broker is undergoing a controlled shutdown, the BrokerService.stop() method 
> stops things in this order: 
> * services 
> * connectors 
> * registered vm transports 
> * broker 
> So there is a period where the broker will still process sends after other 
> (apparently necessary) facilities have been shutdown.  I have not followed 
> the code paths to understand exactly what goes wrong, but I traced enough to 
> tell that messages sent in this interval can disappear.  That is, the client 
> send call will return without error but after failover the slave will not 
> replay the message.  
> This appears to only be an issue during a controlled shutdown.  Process death 
> should not cause this problem. 
> I'm currently working around this by having the BrokerService set a stopping 
> flag and having the MasterBroker check this flag and reject sends (with a new 
> exception class) if true.  My client code then detects this case and just 
> retries until the failover is complete.  It seems like there should be a 
> better and more integrated solution that does not require the client code to 
> handle this but is handled inside of the FailoverTransport code on the 
> client's behalf.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to