[jira] [Commented] (ARTEMIS-2916) Two servers becoming Live using JDBC Shared Store

Francesco Nigro (Jira) Thu, 24 Sep 2020 10:40:56 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201696#comment-17201696
 ]


Francesco Nigro commented on ARTEMIS-2916:
------------------------------------------

Hi [~apachedev], re 

> I see that ARTEMIS-2421 was fixed only in the filesystem use-case.

What makes you believe is the same issue?

> we got suddenly one Backup server becoming Live too

Please provide the logs of the servers, because if an existing backup has been 
able to become live it means that the current live broker lease lock has been 
expired. 
This can happen if:
* there is a HUGE application pause (GC or super slow connection), but 
subsequent timed renew attempts would make the broker to kill itself 
* database timestamp granularity is BAD
* database is very distant from the brokers ie timestamp from it isn't reliable 
enough
* NTP (eg on the DBMS) has decided to move forward making the lock to be expired

We need logs (possibly debug logs are very descriptive) to understand what's 
happened here.

> Two servers becoming Live using JDBC Shared Store
> -------------------------------------------------
>
>                 Key: ARTEMIS-2916
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2916
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.13.0
>            Reporter: Apache Dev
>            Priority: Critical
>
> We have similar scenario described in ARTEMIS-2421, but using:
>  * Artemis 2.13.0
>  * JDBC Shared Store
>  * 1 Master currently down
>  * 3 Slave
>  ** 1 Live
>  ** 2 Backup
>  
> All 3 slaves are configured with:
> {quote}<ha-policy>
>  <shared-store>
>  <slave>
>  <allow-failback>false</allow-failback>
>  <failover-on-shutdown>true</failover-on-shutdown>
>  </slave>
>  </shared-store>
>  </ha-policy>
> {quote}
>  
> After 2 days of activities, with a single Slave working as Live, we got 
> suddenly one Backup server becoming Live too, while the other Live server was 
> still working.
> No warnings/errors available. Just backup server started creating configured 
> addresses, queues and starting connectors, then it logged "AMQ221010: Backup 
> Server is now live". 
> The third slave broker started in the meanwhile to log continuously:
> {{AMQ212034: There are more than one servers on the network broadcasting the 
> same node id. You will see this message exactly once (per node) if a node is 
> restarted, in which case it can be safely ignored. But if it is logged 
> continuously it means you really do have more than one node on the same 
> network active concurrently with the same node id. This could occur if you 
> have a backup node active at the same time as its live node. nodeID=...}}
>  
> Final scenario was:
>  * 1 Master down
>  * 3 Slave
>  ** 2 Live
>  ** 1 Backup
>  
> I see that ARTEMIS-2421 was fixed only in the filesystem use-case.
> Should it be fixed for JDBC too?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-2916) Two servers becoming Live using JDBC Shared Store

Reply via email to