cdbartholomew commented on issue #6683: [broker]Handle BadVersionException thrown by updateSchemaLocator() URL: https://github.com/apache/pulsar/pull/6683#issuecomment-611821295 I am seeing this exact issue while testing the 2.5.1 candidate load: ``` curl localhost:8080/admin/v2/brokers/health --- An unexpected error occurred in the server --- Message: org.apache.pulsar.client.api.PulsarClientException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion Stacktrace: java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) at java.base/java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1423) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.pulsar.client.impl.ProducerImpl.lambda$connectionOpened$14(ProducerImpl.java:1205) at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) at java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.pulsar.client.impl.ClientCnx.handleError(ClientCnx.java:609) at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:171) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.apache.pulsar.client.api.PulsarClientException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:1005) ... 21 more ``` It happens every time the check is run and the topic doesn't already exist. So, the first time it fails like this and then it will work until the topic is deleted, which will happen automatically after a while with default broker settings. I think it would be a good idea to cherry-pick this fix to 2.5.1 since it makes it look like the broker is broken when the user requests a heartbeat in Pulsar Manager. This is also a regression compared to 2.5.0 which doesn't have this issue.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
