Hi,

thanks for your reply.
In fact it is hanging forever, i.e. until the process stops. I have attached the original message I've sent to the mailing list. It only does occur sometimes for SSL connections with a failing handshake. Unfortunately I have no reproducable example for MINA itself. I could probably put something together for QuickFIX/J (the open source project I am working on).

My OS is Ubuntu 14.04.5, JDK1.8_144 and the problem appears not so often on my machine but almost every time on the TravisCI build server (https://travis-ci.org/quickfix-j/quickfixj/builds/283210509). As a result, some of the SSL related tests are failing. TravisCI has almost similar setup with JDK1.8_144 and Debian Linux.

What would be a good starting point to create a test? I see that there is an SslTest in the mina-core module. So I probably have to change that test to repeatedly connect and get a handshake exception everytime and then take a number of stack traces.

Thanks,
Chris.




On 09/10/17 14:51, Jonathan Valliere wrote:
What OS / Java Version / etc;  Do you have a reproducible example?

On Mon, Oct 9, 2017 at 8:34 AM, Jonathan Valliere <jon.valli...@emoten.com <mailto:jon.valli...@emoten.com>> wrote:

    Let me know if its hanging more than 1s

    On Mon, Oct 9, 2017 at 5:08 AM, Christoph John <christoph.j...@macd.com
    <mailto:christoph.j...@macd.com>> wrote:

        Hi,

        I have another question regarding this one. There is
        https://issues.apache.org/jira/browse/DIRMINA-1060
        <https://issues.apache.org/jira/browse/DIRMINA-1060> which also sounds 
a little like the
        problem I'm having. When the connectors are hanging in the call to 
dispose() then there
        always is an accompanying NioProcessor which is hanging in select().

        Example:
        "NioProcessor-60" #100328 prio=5 os_prio=0 tid=0x00007f2a10003000 
nid=0x2e71 runnable
        [0x00007f2a388b1000]
           java.lang.Thread.State: RUNNABLE
                at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
                at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
                at 
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
                at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
                - locked <0x00000000e239c118> (a sun.nio.ch.Util$3)
                - locked <0x00000000e239c108> (a 
java.util.Collections$UnmodifiableSet)
                - locked <0x00000000e239bed0> (a sun.nio.ch.EPollSelectorImpl)
                at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
                at 
org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:98)
                at
        
org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1075)
                at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
                at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:748)


        "NioSocketConnector-38" #100326 prio=5 os_prio=0 tid=0x00007f2a3001d800 
nid=0x2e6f in
        Object.wait() [0x00007f2a1f2d3000]
           java.lang.Thread.State: TIMED_WAITING (on object monitor)
                at java.lang.Object.wait(Native Method)
                at org.apache.mina.core.future.De
        
<http://org.apache.mina.core.future.De>faultIoFuture.await0(DefaultIoFuture.java:209)
                - locked <0x00000000e246ae08> (a org.apache.mina.core.future.De
        <http://org.apache.mina.core.future.De>faultIoFuture)
                at org.apache.mina.core.future.De
        
<http://org.apache.mina.core.future.De>faultIoFuture.awaitUninterruptibly(DefaultIoFuture.java:141)
                at
        
org.apache.mina.core.polling.AbstractPollingIoProcessor.dispose(AbstractPollingIoProcessor.java:188)
                at
        
org.apache.mina.core.service.SimpleIoProcessorPool.dispose(SimpleIoProcessorPool.java:329)
                - locked <0x00000000e246ae40> (a java.lang.Object)
                at
        
org.apache.mina.core.polling.AbstractPollingIoConnector$Connector.run(AbstractPollingIoConnector.java:582)
                at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
                at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:748)


        At first I thought that this was related to
        https://issues.apache.org/jira/browse/DIRMINA-1059
        <https://issues.apache.org/jira/browse/DIRMINA-1059>. In that ticket 
the synchronization
        was improved. However, I am also running into the problem with a build 
of 2.0.17-SNAPSHOT
        where DIRMINA-1059 was solved.

        So my only hope was DIRMINA-1060 ;) Could this improve the situation?

        Thanks,
        Chris.


-- Christoph John
        Development & Support
        Direct: +49 241 557080-28 <tel:%2B49%20241%20557080-28>
        Mailto:christoph.j...@macd.com <mailto:christoph.j...@macd.com>



        http://www.macd.com <http://www.macd.com/>
        
----------------------------------------------------------------------------------------------------

        
----------------------------------------------------------------------------------------------------
        MACD GmbH
        Oppenhoffallee 103 
<https://maps.google.com/?q=Oppenhoffallee+103&entry=gmail&source=g>
        D-52066 Aachen
        Tel: +49 241 557080-0 <tel:%2B49%20241%20557080-0> | Fax: +49 241 
557080-10
        <tel:%2B49%20241%20557080-10>
                 Amtsgericht Aachen: HRB 8151
        Ust.-Id: DE 813021663

        Geschäftsführer: George Macdonald
        
----------------------------------------------------------------------------------------------------

        
----------------------------------------------------------------------------------------------------

        take care of the environment - print only if necessary




--
Christoph John
Development & Support
Direct: +49 241 557080-28
Mailto:christoph.j...@macd.com
        


http://www.macd.com <http://www.macd.com/>
----------------------------------------------------------------------------------------------------
        
----------------------------------------------------------------------------------------------------
MACD GmbH
Oppenhoffallee 103
D-52066 Aachen
Tel: +49 241 557080-0 | Fax: +49 241 557080-10
         Amtsgericht Aachen: HRB 8151
Ust.-Id: DE 813021663

Geschäftsführer: George Macdonald
----------------------------------------------------------------------------------------------------
        
----------------------------------------------------------------------------------------------------

take care of the environment - print only if necessary
--- Begin Message ---
Hi,

I am a developer and maintainer of the QuickFIX/J project (https://github.com/quickfix-j/quickfixj) and I have a question regarding NioSocketConnectors.

We are facing a problem when there is a process that constantly (every 30 seconds) tries to connect to a counterparty and the connection is established but dropped shortly after. Then sometimes the NioProcessors/NioSocketConnectors are not cleaned up properly. In the stack trace we see them hanging in a call to dispose:

"NioProcessor-1140" #239 prio=5 os_prio=0 tid=0x0000000001fe1800 nid=0x2523 runnable [0x00007f9c67e8f000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
        - locked <0x00000000f6699e60> (a sun.nio.ch.Util$3)
        - locked <0x00000000f6699e50> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00000000f6699c18> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
        at 
org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:98)
at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1075)
        at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)

"NioSocketConnector-68" #238 prio=5 os_prio=0 tid=0x00007f9c70caf000 nid=0x2522 in Object.wait() [0x00007f9c6af9f000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at 
org.apache.mina.core.future.DefaultIoFuture.await0(DefaultIoFuture.java:209)
        - locked <0x00000000f66ac718> (a 
org.apache.mina.core.future.DefaultIoFuture)
        at 
org.apache.mina.core.future.DefaultIoFuture.awaitUninterruptibly(DefaultIoFuture.java:141)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.dispose(AbstractPollingIoProcessor.java:188)
        at 
org.apache.mina.core.service.SimpleIoProcessorPool.dispose(SimpleIoProcessorPool.java:329)
        - locked <0x00000000f66ac750> (a java.lang.Object)
at org.apache.mina.core.polling.AbstractPollingIoConnector$Connector.run(AbstractPollingIoConnector.java:582)
        at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)

It does not happen very often: about 5% of the connection attempts leave a 
NioSocketConnector hanging.
It only seems to happen though when the connection is disconnected by "javax.net.ssl.SSLHandshakeException: SSL handshake failed". Although there are cases when there is no leak even on an SSLHandshakeException. If the connection was reset "normally" by "java.io.IOException: Connection reset by peer" then the leak does not seem to occur. It also does not occur when the connection is refused right away.

Since this seems to be related to SSL connections: is there something that we need to take care of when using the SSL filter?

The code for the IoSessionInitiator can be found here: https://github.com/quickfix-j/quickfixj/blob/master/quickfixj-core/src/main/java/quickfix/mina/initiator/IoSessionInitiator.java I have added some comments in this gist (starting with "chrjohn"): https://gist.github.com/chrjohn/2671f06d80e8d917d9061b573477ec5b

I cannot rule out that we might be doing something wrong here, so any pointer 
is appreciated. :)

Thanks in advance for your help and best regards,
Chris.

--
Christoph John
Development & Support
Direct: +49 241 557080-28
Mailto:christoph.j...@macd.com
        


http://www.macd.com <http://www.macd.com/>
----------------------------------------------------------------------------------------------------
        
----------------------------------------------------------------------------------------------------
MACD GmbH
Oppenhoffallee 103
D-52066 Aachen
Tel: +49 241 557080-0 | Fax: +49 241 557080-10
         Amtsgericht Aachen: HRB 8151
Ust.-Id: DE 813021663

Geschäftsführer: George Macdonald
----------------------------------------------------------------------------------------------------
        
----------------------------------------------------------------------------------------------------

take care of the environment - print only if necessary

--- End Message ---

Reply via email to