Update to this:  per https://github.com/openssl/openssl/issues/8068 it
looks like BoringSSL should avoid this issue, so it may be related to
client behavior of some sort.  It's unclear to me from the message whether
it's intra-cluster traffic or client/cluster traffic generating the error.

On Wed, Apr 12, 2023 at 11:36 AM Elliott Sims <elli...@backblaze.com> wrote:

> A few weeks ago, we rolled out TLS among hosts in our clusters (running
> 4.0.7).  More recently we also rolled out TLS between Cassandra clients and
> the cluster.  Today, we started seeing a lot of dropped actions in one
> cluster that correlate with warnings like this:
>
> WARN  [epollEventLoopGroup-5-31] 2023-04-12 15:43:34,476
> PreV5Handlers.java:261 - Unknown exception in client networking
>
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException:
> error:10000104:SSL routines:OPENSSL_internal:TOO_MANY_KEY_UPDATES
>
>         at
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:478)
>
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>
>         at
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>
>         at
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>
>         at
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>
>         at
> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>
>         at
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>
>         at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>
>         at java.base/java.lang.Thread.run(Thread.java:829)
>
> Caused by: javax.net.ssl.SSLException: error:10000104:SSL
> routines:OPENSSL_internal:TOO_MANY_KEY_UPDATES
>
>         at
> io.netty.handler.ssl.ReferenceCountedOpenSslEngine.shutdownWithError(ReferenceCountedOpenSslEngine.java:1028)
>
>         at
> io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:1321)
>
>         at
> io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1270)
>
>         at
> io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1346)
>
>         at
> io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1389)
>
>         at
> io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:206)
>
>         at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1387)
>
>         at
> io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1294)
>
>         at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1331)
>
>         at
> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508)
>
>         at
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447)
>
>         ... 15 common frames omitted
>
> INFO  [ScheduledTasks:1] 2023-04-12 15:46:19,701
> MessagingMetrics.java:206 - READ_RSP messages were dropped in last 5000 ms:
> 0 internal and 3 cross node. Mean internal dropped latency: 0 ms and Mean
> cross-node dropped latency: 5960 ms
>
> This looks similar to a bug in OpenSSL fixed in 2019:
> https://github.com/openssl/openssl/pull/8299
> but the equivalent change doesn't seem to have been ported over to
> BoringSSL.  Has anyone else run across this, or have some sort of
> workaround?
>
>

-- 
This email, including its contents and any attachment(s), may contain 
confidential and/or proprietary information and is solely for the review 
and use of the intended recipient(s). If you have received this email in 
error, please notify the sender and permanently delete this email, its 
content, and any attachment(s).  Any disclosure, copying, or taking of any 
action in reliance on an email received in error is strictly prohibited.

Reply via email to