[
https://issues.apache.org/jira/browse/AMQ-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Oliver Deckert updated AMQ-5368:
--------------------------------
Description:
using NIOSSL transport, SSL handshakes for ~5000 connections easily stall a
broker taking 100% CPU
I'm using version ActiveMQ 5.8, but it occurs on 5.9, 5.10 versions as well
doing some profiling, it showed up that the SSL handshake on broker side eats
up ~90% of overall CPU time
by checking just the handshake status in very high frequency
top 3 methods sorted by own processor time:
com.sun.net.ssl.internal.ssl.SSLEngineImpl.getHandshakeStatus()
org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake()
com.sun.net.ssl.internal.ssl.SSLEngineImpl.getHSStatus(javax.net.ssl.SSLEngineResult$HandshakeStatus)
the reason is the asynchronous nature of the SSL handshake with NIO, especially
the execution of delegated tasks:
- NIOSSLTransport.doHandshake() executes delegated tasks using a
TaskRunnerFactory asynchronously
- in the meantime it loops calling SSLEngine.getHandshakeStatus()
to improve the situation I did the following changes:
- run delegated tasks synchronously in method doHandshake (handshake status
NEED_TASK) instead of asynchronously
- added some small wait cycles in method secureRead as there is not always data
available with NIO (to further reduce the number of calls to
SSLEngine.getHandshakeStatus)
after these changes the SSL handshake for several thousand connections in
parallel was not a problem anymore
was:
using NIOSSL transport, SSL handshakes for ~5000 connections easily stalls a
broker taking 100% CPU
I'm using version ActiveMQ 5.8, but it occurs on 5.9, 5.10 versions as well
doing some profiling, it showed up that the SSL handshake on broker side eats
up ~90% of overall CPU time
by checking just the handshake status in very high frequency
top 3 methods sorted by own processor time:
com.sun.net.ssl.internal.ssl.SSLEngineImpl.getHandshakeStatus()
org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake()
com.sun.net.ssl.internal.ssl.SSLEngineImpl.getHSStatus(javax.net.ssl.SSLEngineResult$HandshakeStatus)
the reason is the asynchronous nature of the SSL handshake with NIO, especially
the execution of delegated tasks:
- NIOSSLTransport.doHandshake() executes delegated tasks using a
TaskRunnerFactory asynchronously
- in the meantime it loops calling SSLEngine.getHandshakeStatus()
to improve the situation I did the following changes:
- run delegated tasks synchronously in method doHandshake (handshake status
NEED_TASK) instead of asynchronously
- added some small wait cycles in method secureRead as there is not always data
available with NIO (to further reduce the number of calls to
SSLEngine.getHandshakeStatus)
after these changes the SSL handshake for several thousand connections in
parallel was not a problem anymore
> SSL handshake stalls broker with NIO
> ------------------------------------
>
> Key: AMQ-5368
> URL: https://issues.apache.org/jira/browse/AMQ-5368
> Project: ActiveMQ
> Issue Type: Bug
> Components: Transport
> Affects Versions: 5.8.0
> Environment: java version "1.7.0_65"
> Reporter: Oliver Deckert
> Labels: nio, ssl
> Attachments: NIOSSLTransport.patch
>
>
> using NIOSSL transport, SSL handshakes for ~5000 connections easily stall a
> broker taking 100% CPU
> I'm using version ActiveMQ 5.8, but it occurs on 5.9, 5.10 versions as well
> doing some profiling, it showed up that the SSL handshake on broker side eats
> up ~90% of overall CPU time
> by checking just the handshake status in very high frequency
> top 3 methods sorted by own processor time:
> com.sun.net.ssl.internal.ssl.SSLEngineImpl.getHandshakeStatus()
> org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake()
> com.sun.net.ssl.internal.ssl.SSLEngineImpl.getHSStatus(javax.net.ssl.SSLEngineResult$HandshakeStatus)
> the reason is the asynchronous nature of the SSL handshake with NIO,
> especially the execution of delegated tasks:
> - NIOSSLTransport.doHandshake() executes delegated tasks using a
> TaskRunnerFactory asynchronously
> - in the meantime it loops calling SSLEngine.getHandshakeStatus()
> to improve the situation I did the following changes:
> - run delegated tasks synchronously in method doHandshake (handshake status
> NEED_TASK) instead of asynchronously
> - added some small wait cycles in method secureRead as there is not always
> data available with NIO (to further reduce the number of calls to
> SSLEngine.getHandshakeStatus)
> after these changes the SSL handshake for several thousand connections in
> parallel was not a problem anymore
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)