[jira] [Comment Edited] (ARTEMIS-2916) Two servers becoming Live using JDBC Shared Store

Francesco Nigro (Jira) Mon, 28 Sep 2020 22:45:40 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203646#comment-17203646
 ]


Francesco Nigro edited comment on ARTEMIS-2916 at 9/29/20, 5:44 AM:
--------------------------------------------------------------------

Hi [~apachedev]

looking at the logs I see a different story: 
 # broker1 is in unknown state: it is isolated? looking at the logs I cannot 
say it and the logs between 9/23/20 4:32:40:937 and 9/23/20 4:33:03:303 have an 
hole of few seconds..it's a GC pause?
 # broker2 is not in an healthy state, could be either slow connection or 
scarse CPU resources: see the many "backup lock renew period lasted X ms 
instead of Y ms" I strongly suggest to tune the renew timeout (and expiration 
time)
 # broker3 same log issue then the other 2 servers: logs are not sufficient to 
investigate more


was (Author: nigrofranz):
Hi [~apachedev]

looking at the logs I see a different story: 
 # broker1 is in unknown state: it is isolated? but the logs I cannot say it 
and the logs between 9/23/20 4:32:40:937 and 9/23/20 4:33:03:303 has an hole of 
few seconds..it's a GC pause?
 # broker2 is not in an healthy state, could be either slow connection or 
scarse CPU resources: see the many "backup lock renew period lasted X ms 
instead of Y ms" I strongly suggest to tune the renew timeout (and expiration 
time)
 # broker3 same log issue then the other 2 servers: logs are not sufficient to 
investigate more

> Two servers becoming Live using JDBC Shared Store
> -------------------------------------------------
>
>                 Key: ARTEMIS-2916
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2916
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.13.0
>            Reporter: Apache Dev
>            Priority: Critical
>         Attachments: logs.zip
>
>
> We have similar scenario described in ARTEMIS-2421 but using:
>  * Artemis 2.13.0
>  * JDBC Shared Store
>  * 1 Master currently down
>  * 3 Slave
>  ** 1 Live
>  ** 2 Backup
> All 3 slaves are configured with:
> {code:xml}
> <ha-policy>
>    <shared-store>
>       <slave>
>          <allow-failback>false</allow-failback>
>          <failover-on-shutdown>true</failover-on-shutdown>
>       </slave>
>    </shared-store>
> </ha-policy>
> {code}
>  
> After 2 days of activities, with a single slave working as live we got 
> suddenly one slave server becoming live too while the other live server was 
> still working. No warnings/errors available. Just backup server started 
> creating configured addresses, queues and starting connectors, then it logged 
> "AMQ221010: Backup Server is now live". 
> The third slave broker started in the meanwhile to log continuously:
> {noformat}
> AMQ212034: There are more than one servers on the network broadcasting the 
> same node id. You will see this message exactly once (per node) if a node is 
> restarted, in which case it can be safely ignored. But if it is logged 
> continuously it means you really do have more than one node on the same 
> network active concurrently with the same node id. This could occur if you 
> have a backup node active at the same time as its live node. nodeID=...
> {noformat} 
> Final scenario was:
>  * 1 Master down
>  * 3 Slave
>  ** 2 Live
>  ** 1 Backup 
> I see that ARTEMIS-2421 was fixed only in the filesystem use-case. Should it 
> be fixed for JDBC too?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARTEMIS-2916) Two servers becoming Live using JDBC Shared Store

Reply via email to