[
https://issues.apache.org/jira/browse/ARTEMIS-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729028#comment-16729028
]
ASF GitHub Bot commented on ARTEMIS-2213:
-----------------------------------------
Github user franz1981 commented on the issue:
https://github.com/apache/activemq-artemis/pull/2481
I will be able to look into that in the next days :)
I can already ask you to collect time to safepoints/GC pauses if possible
ie -XX:+PrintGCApplicationStoppedTime.
2 minutes seems a too long period TBH, but worth taking a look if you rely
on MAPPED journal and/or paging a lot, given that major page faulting can
causes long stall similar to very long full GC, but TBH nothing so long (2
minutes is a lot!).
As an additional suggestion you could run a Java program that just use one
core and jmeasure the elapsed time between 2 consecutive nanoTime calls,
recording in which wall-clock time a back-ward "drift" has happened to check if
a broker shutdown has happened near the same time, makes sense?
> Clock drift causing server halt
> -------------------------------
>
> Key: ARTEMIS-2213
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2213
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.6.3
> Reporter: yangwei
> Priority: Critical
>
> In our production cluster some brokers crashed. There is nothing unusual in
> the dump stack. After digging into code, we found component was incorrectly
> expired. When clock drifted back, left time was less than enter time. If the
> component was not entered in default 120000ms, it would be expired and server
> was halted.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)