Hi,
We’re leveraging netty (v4.1.92) via Vertx (v4.3.8)
configured to use OpenJdkSSLEngine with a deletegatedTaskExecutor() when
creating the ssl_handler for our webserver. In general, we see SSL
operations (e.g., certificate validation) occurring on a worker thread, but
we have run into a deadlock due to our current implementation and wanted
some feedback on options.
The crux of the issue is that during our certificate validation via our
custom TrustManager, we need to communicate with another internal component
(which is running on a typical event loop thread) to complete the
validation processing, but other SSL operations block the event loop on
which that internal service is running by grabbing a lock. Its my
understanding that not all blocking SSL operations are offloaded, so there
are likely more cases like these floating around.
What we are specifically seeing is that while we’re waiting on a response
from the internal component to validate the certificate, a simultaneous
request to close a TLS connection comes in. The closing of the session is
running on the event loop and blocks by waiting on a lock, but
unfortunately, that lock is held by the worker thread waiting on a response
from the internal component. IOW, the internal service cannot respond to
the worker thread because it its event loop is blocked by the close
connection processing. At this point, our entire webserver is dead in the
water until we restart the JVM.
Sample traces are listed below from this example:
- Close connection processing is running on event loop thread 0 and is
blocked trying to grab a lock that is held by a worker thread
- Cert validation in the TrustManager is running on a worker thread and
is awaiting a response from the blocked service running on event loop
thread 0
- Internal service to complete the cert validation is affined to event
loop thread 0, but its blocked because of the close connection processing
We’ve looked at several options:
- Detect the deadlock and fail the client authentication. The issue
here is bad user experience so we’d like to avoid it.
- Segregate the our TLS handling in the webserver and the blocking SSL
handling into two DIFFERENT worker threads so that that we never block an
event loop. This is a risker change for us, though.
- Remove the need to communicate with the internal service running on an
event loop thread.
Any thoughts on other/better ways to handle this?
Thanks in advance,
Chris
# close connection handler running on event loop thread 0 is blocked
Sep 15 10:13:43.470794 APM00203802612-A control-path[37578]:
"vert.x-eventloop-thread-0" #25 prio=5 os_prio=0 cpu=595313.19ms
elapsed=28293.78s tid=0x00007f5bb8021800 nid=0x22c waiting for monitor
entry [0x00007f5c0034d000]
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]:
java.lang.Thread.State: BLOCKED (on object monitor)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
sun.security.ssl.SSLEngineImpl.closeOutbound(
[email protected]/SSLEngineImpl.java:749)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: -
waiting to lock <0x00000000dca1d0f8> (a sun.security.ssl.SSLEngineImpl)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1874)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1078)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:301)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at
java.lang.Thread.run([email protected]/Thread.java:829)
# Trust manager waiting on a response from our internal component which is
running on event loop thread 0. Note that the lock that the close
connection handler is waiting on is held by this thread.
Sep 15 10:13:43.479257 APM00203802612-A control-path[37578]:
"vert.x-internal-blocking-1" #42 prio=5 os_prio=0 cpu=23498.06ms
elapsed=28290.74s tid=0x00007f5bb02b1000 nid=0x23b waiting on condition
[0x00007f5bc4e59000]Sep 15 10:13:43.479257 APM00203802612-A
control-path[37578]: java.lang.Thread.State: WAITING (parking)Sep 15
10:13:43.479257 APM00203802612-A control-path[37578]: at
jdk.internal.misc.Unsafe.park([email protected]/Native Method)Sep 15
10:13:43.479257 APM00203802612-A control-path[37578]: - parking to
wait for <0x00000000dca24ec0> (a java.util.concurrent.CountDownLatch$Sync)Sep
15 10:13:43.479257 APM00203802612-A control-path[37578]: at
java.util.concurrent.locks.LockSupport.park(
[email protected]/LockSupport.java:194)Sep 15 10:13:43.479257
APM00203802612-A control-path[37578]: at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(
[email protected]/AbstractQueuedSynchronizer.java:885)Sep 15
10:13:43.479257 APM00203802612-A control-path[37578]: at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(
[email protected]/AbstractQueuedSynchronizer.java:1039)Sep 15
10:13:43.479257 APM00203802612-A control-path[37578]: at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(
[email protected]/AbstractQueuedSynchronizer.java:1345)Sep 15
10:13:43.479257 APM00203802612-A control-path[37578]: at
java.util.concurrent.CountDownLatch.await(
[email protected]/CountDownLatch.java:232)Sep 15 10:13:43.479257
APM00203802612-A control-path[37578]: at
io.reactivex.rxjava3.internal.observers.BlockingMultiObserver.blockingGet(BlockingMultiObserver.java:86)Sep
15 10:13:43.479257 APM00203802612-A control-path[37578]: at
io.reactivex.rxjava3.core.Single.blockingGet(Single.java:3645)Sep 15
10:13:43.479257 APM00203802612-A control-path[37578]: at
com.emc.cyclone.contexts.security.credstore.CycTrustOptions$CycTrustManager.validateCertificateWithCmfa(CycTrustOptions.java:442)
--
You received this message because you are subscribed to the Google Groups
"Netty discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/netty/8f31febf-5aaa-47b5-8a37-b71d63edf6acn%40googlegroups.com.