Hi,

thanks for your reply. I will check the points you mentioned and come back.
Most of the code in that class is quite some years old and started off as a C++ project (quickfix) and was also using MINA 1.x before. So it might be that nowadays there are some suboptimal things done there. :)

Cheers,
Chris.

On 26/07/17 16:33, Emmanuel Lécharny wrote:

Le 26/07/2017 à 13:59, Christoph John a écrit :
Hi,

I am a developer and maintainer of the QuickFIX/J project
(https://github.com/quickfix-j/quickfixj) and I have a question
regarding NioSocketConnectors.

We are facing a problem when there is a process that constantly (every
30 seconds) tries to connect to a counterparty and the connection is
established but dropped shortly after. Then sometimes the
NioProcessors/NioSocketConnectors are not cleaned up properly. In the
stack trace we see them hanging in a call to dispose:

"NioProcessor-1140" #239 prio=5 os_prio=0 tid=0x0000000001fe1800
nid=0x2523 runnable [0x00007f9c67e8f000]
    java.lang.Thread.State: RUNNABLE
         at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
         at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
         at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
         at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
         - locked <0x00000000f6699e60> (a sun.nio.ch.Util$3)
         - locked <0x00000000f6699e50> (a
java.util.Collections$UnmodifiableSet)
         - locked <0x00000000f6699c18> (a sun.nio.ch.EPollSelectorImpl)
         at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
         at
org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:98)
         at
org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1075)
         at
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
         at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:748)

"NioSocketConnector-68" #238 prio=5 os_prio=0 tid=0x00007f9c70caf000
nid=0x2522 in Object.wait() [0x00007f9c6af9f000]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
         at java.lang.Object.wait(Native Method)
         at
org.apache.mina.core.future.DefaultIoFuture.await0(DefaultIoFuture.java:209)
         - locked <0x00000000f66ac718> (a
org.apache.mina.core.future.DefaultIoFuture)
         at
org.apache.mina.core.future.DefaultIoFuture.awaitUninterruptibly(DefaultIoFuture.java:141)
         at
org.apache.mina.core.polling.AbstractPollingIoProcessor.dispose(AbstractPollingIoProcessor.java:188)
         at
org.apache.mina.core.service.SimpleIoProcessorPool.dispose(SimpleIoProcessorPool.java:329)
         - locked <0x00000000f66ac750> (a java.lang.Object)
         at
org.apache.mina.core.polling.AbstractPollingIoConnector$Connector.run(AbstractPollingIoConnector.java:582)
         at
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
         at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:748)

It does not happen very often: about 5% of the connection attempts
leave a NioSocketConnector hanging.
It only seems to happen though when the connection is disconnected by
"javax.net.ssl.SSLHandshakeException: SSL handshake failed". Although
there are cases when there is no leak even on an SSLHandshakeException.
If the connection was reset "normally" by "java.io.IOException:
Connection reset by peer" then the leak does not seem to occur. It
also does not occur when the connection is refused right away.

Since this seems to be related to SSL connections: is there something
that we need to take care of when using the SSL filter?

The code for the IoSessionInitiator can be found here:
https://github.com/quickfix-j/quickfixj/blob/master/quickfixj-core/src/main/java/quickfix/mina/initiator/IoSessionInitiator.java
I have added some comments in this gist (starting with "chrjohn"):
https://gist.github.com/chrjohn/2671f06d80e8d917d9061b573477ec5b

I cannot rule out that we might be doing something wrong here, so any
pointer is appreciated. :)
I see in your code that you are waiting 2s for the connection to be
established, and if this timeout is reached, you try again, up to teh
point you bail out. In tjis case, teh connection is not cleared up, AFAICT.

Is that correct ?


OTOH, it does not necessarily makes a lot of sense to poll the connector
: as MINA is fully asynchronous, you'll be informed when the connection
is established, and if not, you can use the idle event to know that your
connection is idling (an idle event is generated every second, so
waiting for, say, 30 idle events will let you manage a 30s timeout, for
instance). If your connection idle for too long, simply dispose it.

Thanks in advance for your help and best regards,
Chris.


--
Christoph John
Development & Support
Direct: +49 241 557080-28
Mailto:[email protected]
        


http://www.macd.com <http://www.macd.com/>
----------------------------------------------------------------------------------------------------
        
----------------------------------------------------------------------------------------------------
MACD GmbH
Oppenhoffallee 103
D-52066 Aachen
Tel: +49 241 557080-0 | Fax: +49 241 557080-10
         Amtsgericht Aachen: HRB 8151
Ust.-Id: DE 813021663

Geschäftsführer: George Macdonald
----------------------------------------------------------------------------------------------------
        
----------------------------------------------------------------------------------------------------

take care of the environment - print only if necessary

Reply via email to