[ 
https://issues.apache.org/jira/browse/AMQ-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oliver Deckert updated AMQ-5368:
--------------------------------
      Component/s:     (was: JMS client)
                   Transport
      Description: 
using NIOSSL transport, SSL handshakes for ~5000 connections easily stalls a 
broker taking 100% CPU

I'm using version ActiveMQ 5.8, but it occurs on 5.9, 5.10 versions as well

doing some profiling, it showed up that the SSL handshake on broker side eats 
up ~90% of overall CPU time
by checking just the handshake status in very high frequency

top 3 methods sorted by own processor time:
com.sun.net.ssl.internal.ssl.SSLEngineImpl.getHandshakeStatus()
org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake()
com.sun.net.ssl.internal.ssl.SSLEngineImpl.getHSStatus(javax.net.ssl.SSLEngineResult$HandshakeStatus)

the reason is the asynchronous nature of the SSL handshake with NIO, especially 
the execution of delegated tasks:
- NIOSSLTransport.doHandshake() executes delegated tasks using a 
TaskRunnerFactory asynchronously
- in the meantime it loops calling SSLEngine.getHandshakeStatus()

to improve the situation I did the following changes:
- run delegated tasks synchronously in method doHandshake (handshake status 
NEED_TASK) instead of asynchronously
- added some small wait cycles in method secureRead as there is not always data 
available with NIO (to further reduce the number of calls to 
SSLEngine.getHandshakeStatus)

after these changes the SSL handshake for several thousand connections in 
parallel was not a problem anymore


  was:
Start with a client application running qpid v0.24 connecting to activemq 
v5.8.0 server over amqps. Configure the activemq server to use client SSL 
certificates for authentication.

{code}
                <!-- aqmp with SSL client certs -->
         <transportConnector name="amqps" 
uri="amqp+ssl://0.0.0.0:5672?maximumConnections=1000&amp;wireformat.maxFrameSize=104857600&amp;transport.transformer=jms&amp;needClientAuth=true"/>
{code}

This works and messages successfully flow from server to client. Qpid however 
has a fatal bug where it cannot recover from broken connections, and so attempt 
to switch to the activemq amqp client to work around this problem.

On the client, we initialise activemq-amqp with the following parameters:

{code}
<!-- activemq -->
  <Parameter name="java.naming.factory.initial" 
value="org.apache.activemq.jndi.ActiveMQInitialContextFactory" />
  <Parameter name="connectionFactoryNames" value="ConnectionFactory, 
QueueConnectionFactory, TopicConnectionFactory" />

  <!-- Server to connect to -->
<!-- activemq -->
  <Parameter name="java.naming.provider.url" 
value="amqp+nio+ssl://amqp.${env:SERVER_ENV}.example.com:5672" />
{code}

With activemq-amqp in place instead of qpid, the client starts up, but no 
messages are processed. Instead, it is found that the aqmp+nio+ssl provider is 
spinning the CPU at 100% part of the way through the SSL handshake process.

A thread dump of the spinning thread is as follows:

{code}
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode):

"localhost-startStop-1" daemon prio=10 tid=0x000000000179b800 nid=0x638e 
runnable [0x00007fd1fd84a000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225)
        at sun.nio.ch.IOUtil.read(IOUtil.java:198)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375)
        - locked <0x00000000c4da50e8> (a java.lang.Object)
        at 
org.apache.activemq.transport.nio.NIOSSLTransport.secureRead(NIOSSLTransport.java:285)
        at 
org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake(NIOSSLTransport.java:333)
        at 
org.apache.activemq.transport.nio.NIOSSLTransport.initializeStreams(NIOSSLTransport.java:128)
        at 
org.apache.activemq.transport.amqp.AmqpNioSslTransport.initializeStreams(AmqpNioSslTransport.java:43)
        at 
org.apache.activemq.transport.tcp.TcpTransport.connect(TcpTransport.java:514)
        at 
org.apache.activemq.transport.nio.NIOTransport.doStart(NIOTransport.java:156)
        at 
org.apache.activemq.transport.nio.NIOSSLTransport.doStart(NIOSSLTransport.java:356)
        at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
        at 
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
        at 
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
        at 
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
        at 
org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:273)
        at 
org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:238)
        at 
org.apache.activemq.ActiveMQConnectionFactory.createConnection(ActiveMQConnectionFactory.java:184)
...
{code}

If an attempt is made to restart the activemq server, despite the spinning 
thread on the client the server side disconnection is detected by the client 
and the following exception is logged and the connection is successfully 
aborted:

{code}
Caused by: java.io.IOException: javax.net.ssl.SSLException: Received 
close_notify during handshake
        at 
org.apache.activemq.transport.nio.NIOSSLTransport.initializeStreams(NIOSSLTransport.java:130)
        at 
org.apache.activemq.transport.amqp.AmqpNioSslTransport.initializeStreams(AmqpNioSslTransport.java:43)
        at 
org.apache.activemq.transport.tcp.TcpTransport.connect(TcpTransport.java:514)
        at 
org.apache.activemq.transport.nio.NIOTransport.doStart(NIOTransport.java:156)
        at 
org.apache.activemq.transport.nio.NIOSSLTransport.doStart(NIOSSLTransport.java:356)
        at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
        at 
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
        at 
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
        at 
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
        at 
org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:273)
        ... 33 more
Caused by: javax.net.ssl.SSLException: Received close_notify during handshake
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
        at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1619)
        at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1587)
        at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1732)
        at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1060)
        at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:884)
        at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:758)
        at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
        at 
org.apache.activemq.transport.nio.NIOSSLTransport.secureRead(NIOSSLTransport.java:304)
        at 
org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake(NIOSSLTransport.java:333)
        at 
org.apache.activemq.transport.nio.NIOSSLTransport.initializeStreams(NIOSSLTransport.java:128)
        ... 42 more
{code}

What seems to be happening is that amqp+nio+ssl gets part of the way through 
the handshake fails and goes into a spin, continuing the spin until the TCP 
connection is killed from the other side, at which point the client aborts and 
continues as if nothing had happened.


       Patch Info: Patch Available
      Environment: 
java version "1.7.0_65"


  was:
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)


    Fix Version/s:     (was: 5.9.0)
           Labels: nio ssl  (was: amqp nio ssl)

> SSL handshake stalls broker with NIO
> ------------------------------------
>
>                 Key: AMQ-5368
>                 URL: https://issues.apache.org/jira/browse/AMQ-5368
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Transport
>    Affects Versions: 5.8.0
>         Environment: java version "1.7.0_65"
>            Reporter: Oliver Deckert
>              Labels: nio, ssl
>
> using NIOSSL transport, SSL handshakes for ~5000 connections easily stalls a 
> broker taking 100% CPU
> I'm using version ActiveMQ 5.8, but it occurs on 5.9, 5.10 versions as well
> doing some profiling, it showed up that the SSL handshake on broker side eats 
> up ~90% of overall CPU time
> by checking just the handshake status in very high frequency
> top 3 methods sorted by own processor time:
> com.sun.net.ssl.internal.ssl.SSLEngineImpl.getHandshakeStatus()
> org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake()
> com.sun.net.ssl.internal.ssl.SSLEngineImpl.getHSStatus(javax.net.ssl.SSLEngineResult$HandshakeStatus)
> the reason is the asynchronous nature of the SSL handshake with NIO, 
> especially the execution of delegated tasks:
> - NIOSSLTransport.doHandshake() executes delegated tasks using a 
> TaskRunnerFactory asynchronously
> - in the meantime it loops calling SSLEngine.getHandshakeStatus()
> to improve the situation I did the following changes:
> - run delegated tasks synchronously in method doHandshake (handshake status 
> NEED_TASK) instead of asynchronously
> - added some small wait cycles in method secureRead as there is not always 
> data available with NIO (to further reduce the number of calls to 
> SSLEngine.getHandshakeStatus)
> after these changes the SSL handshake for several thousand connections in 
> parallel was not a problem anymore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to