pqab opened a new issue, #21933: URL: https://github.com/apache/pulsar/issues/21933
### Search before asking - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### Version 2.10.5 ### Minimal reproduce step 1. Create topic ``` bin/pulsar-admin tenants create tenant1 bin/pulsar-admin namespaces create tenant1/namespace1 bin/pulsar-admin namespaces set-persistence --bookkeeper-ack-quorum 2 --bookkeeper-ensemble 3 --bookkeeper-write-quorum 3 --ml-mark-delete-max-rate 0 tenant1/namespace1 bin/pulsar-admin namespaces set-retention tenant1/namespace1 --size -1 --time 3d bin/pulsar-admin namespaces set-message-ttl tenant1/namespace1 --messageTTL 604800 bin/pulsar-admin topics create-partitioned-topic tenant1/namespace1/topic1 -p 3 ``` 2. Produce large payload & batch from the admin tool with tls ``` bin/pulsar-perf produce persistent://tenant1/namespace1/topic1 -mk autoIncrement -bb 5242880 -r 5000 -s 5242 -bm 1000 -threads 30 --auth-plugin org.apache.pulsar.client.impl.auth.AuthenticationTls --auth-params '{"tlsCertFile":"conf/user.cer","tlsKeyFile":"conf/user.key.pem"}' ``` 3. Stop until it produced around 1 million messages 4. Wait until all the messages goes to BookKeeper backlog 5. Start consumer to consume all the messages with tls ``` bin/pulsar-perf consume persistent://tenant1/namespace1/topic1 --auth-plugin org.apache.pulsar.client.impl.auth.AuthenticationTls --auth-params '{"tlsCertFile":"conf/user.cer","tlsKeyFile":"conf/user.key.pem"}' -sp Earliest -ss sub1 ``` ### What did you expect to see? Able to consume all produced messages properly from the consumer ### What did you see instead? Consumer stopped receiving msg in the middle, and could see some error from the broker logs like ``` 2024-01-19T14:05:39,899+0000 [BookKeeperClientWorker-OrderedExecutor-4-0] ERROR org.apache.bookkeeper.proto.checksum.DigestManager - Mac mismatch for ledger-id: 852, entry-id: 35932 2024-01-19T14:05:39,902+0000 [BookKeeperClientWorker-OrderedExecutor-4-0] ERROR org.apache.bookkeeper.proto.checksum.DigestManager - Mac mismatch for ledger-id: 852, entry-id: 35932 2024-01-19T14:05:39,916+0000 [BookKeeperClientWorker-OrderedExecutor-4-0] ERROR org.apache.bookkeeper.proto.checksum.DigestManager - Mac mismatch for ledger-id: 852, entry-id: 35932 2024-01-19T14:05:39,916+0000 [BookKeeperClientWorker-OrderedExecutor-4-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L852 E35899-E35998, Sent to [100.87.157.209:3181, 100.111.147.236:3181, 100.96.184.253:3181], Heard from [100.87.157.209:3181, 100.111.147.236:3181, 100.96.184.253:3181] : bitset = {0, 1, 2}, Error = 'Entry digest does not match'. First unread entry is (35973, rc = 0) 2024-01-19T14:05:39,916+0000 [broker-topic-workers-OrderedExecutor-15-0] ERROR org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer - [persistent://tenant1/namespace1/topic1-0 / sub1-Consumer{subscription=PersistentSubscription{topic=persistent://tenant1/namespace1/topic1-0, name=sub1}, consumerId=0, consumerName=383fd, address=/100.96.184.253:50090}] Error reading entries at 852:35899 : Entry digest does not match - Retrying to read in 15.0 seconds ``` ### Anything else? Seems only happening when there is SSL exception in the middle of the produce like ``` 2024-01-19T13:39:13,450+0000 [pulsar-client-io-12-1] WARN org.apache.pulsar.client.impl.ClientCnx - Got exception io.netty.handler.codec.DecoderException: io.netty.handler.ssl.ReferenceCountedOpenSslEngine$OpenSslException: error:100003fc:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_RECORD_MAC at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800) at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: io.netty.handler.ssl.ReferenceCountedOpenSslEngine$OpenSslException: error:100003fc:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_RECORD_MAC at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.newSSLExceptionForError(ReferenceCountedOpenSslEngine.java:1377) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.shutdownWithError(ReferenceCountedOpenSslEngine.java:1089) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:1399) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1325) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1426) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1469) at io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:223) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1353) at io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1257) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1297) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) ... 15 more ``` or ``` 2024-01-19T14:01:02,532+0000 [pulsar-client-io-6-1] WARN org.apache.pulsar.client.impl.ClientCnx - Got exception io.netty.handler.codec.DecoderException: io.netty.handler.ssl.ReferenceCountedOpenSslEngine$OpenSslException: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800) at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: io.netty.handler.ssl.ReferenceCountedOpenSslEngine$OpenSslException: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.newSSLExceptionForError(ReferenceCountedOpenSslEngine.java:1377) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.shutdownWithError(ReferenceCountedOpenSslEngine.java:1089) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:1399) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1325) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1426) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1469) at io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:223) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1353) at io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1257) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1297) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) ... 15 more ``` ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
