[
https://issues.apache.org/jira/browse/DIRMINA-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844687#comment-16844687
]
Guus der Kinderen commented on DIRMINA-1111:
--------------------------------------------
The CPU cycles that get burned are reported to be "system", not "user" cycles.
We've obtained a Call Tree graph from JProfiler, from which I included a
screenshot (the entire thing is 18MB). Just under the section from which I
created this screenshot, there's another 10.2% native CPU usage for
{{java.lang.System.currentTimeMillis}} that's called from within
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}
>From what I can deduce, this looks like it iterates over this snippet from a
>lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected
>is 0, delta =}} being logged, which suggests that the workaround for the EPOLL
>/ spinning selector bug is not even being considered.
{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;
if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}
!image-2019-05-21-11-37-41-931.png!
> 100% CPU (epoll bug) on 2.1.x, Linux only
> -----------------------------------------
>
> Key: DIRMINA-1111
> URL: https://issues.apache.org/jira/browse/DIRMINA-1111
> Project: MINA
> Issue Type: Bug
> Affects Versions: 2.1.2
> Reporter: Guus der Kinderen
> Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
> of a bug that causes 100% CPU usage on Linux (the problem does not occur on
> Windows).
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x00007f0408002000 nid=0x2a6a
> runnable [0x00007f0476dee000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x00000004c486b748> (a sun.nio.ch.Util$3)
> - locked <0x00000004c486b738> (a java.util.Collections$UnmodifiableSet)
> - locked <0x00000004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
> at
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
> at
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
> - <0x00000004c4d03530> (a
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)