ivankelly commented on issue #2942: Intermittent test failure in TestS3Offload URL: https://github.com/apache/pulsar/issues/2942#issuecomment-442068673 The issue is due to the broker being rebooted, but I'm not sure if the client behaviour is problematic or not. It's nothing to do with offload in any case. The sequence is: 1. Client writes a bunch of messages 2. Offload kicks some segments out to S3. 3. The test deletes a ledger and reboots the broker to ensure readback. 4. The client starts reading back the topic from the very start. The problem happens at the reboot and readback phase. The broker reboots and starts serving reads. The client manages to read a good bit. ``` 12:52:21.726 [TestNG-method=testPublishOffloadAndConsumeViaCLI-1:org.apache.pulsar.tests.integration.offload.TestS3Offload@149] INFO org.apache.pulsar.tests.integration.offload.TestS3Offload - IKDEBUG read back entry 999 msgid 17:999:-1 12:52:22.134 [pulsar-client-io-23-1:org.apache.pulsar.client.impl.ClientCnx@183] INFO org.apache.pulsar.client.impl.ClientCnx - [id: 0xe4bec5be, L:/127.0.0.1:52048 ! R:localhost/127.0.0.1:46664] Disconnected 12:52:22.135 [pulsar-client-io-23-1:org.apache.pulsar.client.impl.ConnectionHandler@109] INFO org.apache.pulsar.client.impl.ConnectionHandler - [persistent://s3-offload-test-cli-aojz/ns1/topic1] [my-sub] Closed connection [id: 0xe4bec5be, L:/127.0.0.1:52048 ! R:localhost/127.0.0.1:46664] -- Will try again in 0.1 s 12:52:22.236 [pulsar-timer-25-1:org.apache.pulsar.client.impl.ConnectionHandler@112] INFO org.apache.pulsar.client.impl.ConnectionHandler - [persistent://s3-offload-test-cli-aojz/ns1/topic1] [my-sub] Reconnecting after timeout 12:52:22.243 [pulsar-client-io-23-1:org.apache.pulsar.client.impl.ClientCnx@720] ERROR org.apache.pulsar.client.impl.ClientCnx - [id: 0xbad8308a, L:/127.0.0.1:52042 - R:localhost/127.0.0.1:46664] Close connection because received internal-server error org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connection refused: pulsar-broker-0/172.18.0.6:6650 12:52:22.244 [pulsar-client-io-23-1:org.apache.pulsar.client.impl.BinaryProtoLookupService@151] WARN org.apache.pulsar.client.impl.BinaryProtoLookupService - [persistent://s3-offload-test-cli-aojz/ns1/topic1] failed to send lookup request : org.apache.pulsar.client.api.PulsarClientException$LookupException: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connection refused: pulsar-broker-0/172.18.0.6:6650 java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$LookupException: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connection refused: pulsar-broker-0/172.18.0.6:6650 at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_191] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_191] at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647) ~[?:1.8.0_191] at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_191] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) [?:1.8.0_191] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) [?:1.8.0_191] at org.apache.pulsar.client.impl.ClientCnx.handleLookupResponse(ClientCnx.java:380) [pulsar-client-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] at org.apache.pulsar.common.api.PulsarDecoder.channelRead(PulsarDecoder.java:118) [pulsar-client-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [pulsar-client-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [pulsar-client-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [pulsar-client-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] at ``` Then the broker realizes it can't get register, so kills itself. ``` 12:52:21.754 [main] ERROR org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Broker znode - [/loadbalance/brokers/pulsar-broker-0:8080] is own by different zookeeper-ssession 72259437335805978 12:52:21.755 [main] ERROR org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Unable to create znode - [/loadbalance/brokers/pulsar-broker-0:8080] for load balance on zookeeper ``` When the broker comes back the subscription is reset back to the start (because we hadn't acked), but the client continues as if its reading from where it left off. @merlimat @sijie @jiazhai So a couple of questions: - Do we expect that users of the consumer keep track to ensure message ids are monotonically increasing if they want guarantees on the topic? - Why can the client read from a broker before the broker has registered? All info in build: https://builds.apache.org/job/pulsar_precommit_integrationtests/5164/
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
