Le 26/07/2017 à 13:59, Christoph John a écrit :
> Hi,
>
> I am a developer and maintainer of the QuickFIX/J project
> (https://github.com/quickfix-j/quickfixj) and I have a question
> regarding NioSocketConnectors.
>
> We are facing a problem when there is a process that constantly (every
> 30 seconds) tries to connect to a counterparty and the connection is
> established but dropped shortly after. Then sometimes the
> NioProcessors/NioSocketConnectors are not cleaned up properly. In the
> stack trace we see them hanging in a call to dispose:
>
> "NioProcessor-1140" #239 prio=5 os_prio=0 tid=0x0000000001fe1800
> nid=0x2523 runnable [0x00007f9c67e8f000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>         at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>         at
> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>         at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>         - locked <0x00000000f6699e60> (a sun.nio.ch.Util$3)
>         - locked <0x00000000f6699e50> (a
> java.util.Collections$UnmodifiableSet)
>         - locked <0x00000000f6699c18> (a sun.nio.ch.EPollSelectorImpl)
>         at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>         at
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:98)
>         at
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1075)
>         at
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748)
>
> "NioSocketConnector-68" #238 prio=5 os_prio=0 tid=0x00007f9c70caf000
> nid=0x2522 in Object.wait() [0x00007f9c6af9f000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at
> org.apache.mina.core.future.DefaultIoFuture.await0(DefaultIoFuture.java:209)
>         - locked <0x00000000f66ac718> (a
> org.apache.mina.core.future.DefaultIoFuture)
>         at
> org.apache.mina.core.future.DefaultIoFuture.awaitUninterruptibly(DefaultIoFuture.java:141)
>         at
> org.apache.mina.core.polling.AbstractPollingIoProcessor.dispose(AbstractPollingIoProcessor.java:188)
>         at
> org.apache.mina.core.service.SimpleIoProcessorPool.dispose(SimpleIoProcessorPool.java:329)
>         - locked <0x00000000f66ac750> (a java.lang.Object)
>         at
> org.apache.mina.core.polling.AbstractPollingIoConnector$Connector.run(AbstractPollingIoConnector.java:582)
>         at
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748)
>
> It does not happen very often: about 5% of the connection attempts
> leave a NioSocketConnector hanging.
> It only seems to happen though when the connection is disconnected by
> "javax.net.ssl.SSLHandshakeException: SSL handshake failed". Although
> there are cases when there is no leak even on an SSLHandshakeException.
> If the connection was reset "normally" by "java.io.IOException:
> Connection reset by peer" then the leak does not seem to occur. It
> also does not occur when the connection is refused right away.
>
> Since this seems to be related to SSL connections: is there something
> that we need to take care of when using the SSL filter?
>
> The code for the IoSessionInitiator can be found here:
> https://github.com/quickfix-j/quickfixj/blob/master/quickfixj-core/src/main/java/quickfix/mina/initiator/IoSessionInitiator.java
> I have added some comments in this gist (starting with "chrjohn"):
> https://gist.github.com/chrjohn/2671f06d80e8d917d9061b573477ec5b
>
> I cannot rule out that we might be doing something wrong here, so any
> pointer is appreciated. :)

I see in your code that you are waiting 2s for the connection to be
established, and if this timeout is reached, you try again, up to teh
point you bail out. In tjis case, teh connection is not cleared up, AFAICT.

Is that correct ?


OTOH, it does not necessarily makes a lot of sense to poll the connector
: as MINA is fully asynchronous, you'll be informed when the connection
is established, and if not, you can use the idle event to know that your
connection is idling (an idle event is generated every second, so
waiting for, say, 30 idle events will let you manage a 30s timeout, for
instance). If your connection idle for too long, simply dispose it.

>
> Thanks in advance for your help and best regards,
> Chris.
>

-- 
Emmanuel Lecharny

Symas.com
directory.apache.org

Reply via email to