ivankelly commented on issue #2942: Intermittent test failure in TestS3Offload
URL: https://github.com/apache/pulsar/issues/2942#issuecomment-442068673
 
 
   The issue is due to the broker being rebooted, but I'm not sure if the 
client behaviour is problematic or not. It's nothing to do with offload in any 
case.
   
   The sequence is:
   1. Client writes a bunch of messages
   2. Offload kicks some segments out to S3.
   3. The test deletes a ledger and reboots the broker to ensure readback.
   4. The client starts reading back the topic from the very start.
   
   The problem happens at the reboot and readback phase. The broker reboots and 
starts serving reads. 
   The client manages to read a good bit. 
   ```
   12:52:21.726 
[TestNG-method=testPublishOffloadAndConsumeViaCLI-1:org.apache.pulsar.tests.integration.offload.TestS3Offload@149]
 INFO  org.apache.pulsar.tests.integration.offload.TestS3Offload - IKDEBUG read 
back entry 999 msgid 17:999:-1
   12:52:22.134 
[pulsar-client-io-23-1:org.apache.pulsar.client.impl.ClientCnx@183] INFO  
org.apache.pulsar.client.impl.ClientCnx - [id: 0xe4bec5be, L:/127.0.0.1:52048 ! 
R:localhost/127.0.0.1:46664] Disconnected
   12:52:22.135 
[pulsar-client-io-23-1:org.apache.pulsar.client.impl.ConnectionHandler@109] 
INFO  org.apache.pulsar.client.impl.ConnectionHandler - 
[persistent://s3-offload-test-cli-aojz/ns1/topic1] [my-sub] Closed connection 
[id: 0xe4bec5be, L:/127.0.0.1:52048 ! R:localhost/127.0.0.1:46664] -- Will try 
again in 0.1 s
   12:52:22.236 
[pulsar-timer-25-1:org.apache.pulsar.client.impl.ConnectionHandler@112] INFO  
org.apache.pulsar.client.impl.ConnectionHandler - 
[persistent://s3-offload-test-cli-aojz/ns1/topic1] [my-sub] Reconnecting after 
timeout
   12:52:22.243 
[pulsar-client-io-23-1:org.apache.pulsar.client.impl.ClientCnx@720] ERROR 
org.apache.pulsar.client.impl.ClientCnx - [id: 0xbad8308a, L:/127.0.0.1:52042 - 
R:localhost/127.0.0.1:46664] Close connection because received internal-server 
error org.apache.pulsar.client.api.PulsarClientException: 
java.util.concurrent.CompletionException: 
io.netty.channel.AbstractChannel$AnnotatedConnectException: 
syscall:getsockopt(..) failed: Connection refused: 
pulsar-broker-0/172.18.0.6:6650
   12:52:22.244 
[pulsar-client-io-23-1:org.apache.pulsar.client.impl.BinaryProtoLookupService@151]
 WARN  org.apache.pulsar.client.impl.BinaryProtoLookupService - 
[persistent://s3-offload-test-cli-aojz/ns1/topic1] failed to send lookup 
request : org.apache.pulsar.client.api.PulsarClientException$LookupException: 
org.apache.pulsar.client.api.PulsarClientException: 
java.util.concurrent.CompletionException: 
io.netty.channel.AbstractChannel$AnnotatedConnectException: 
syscall:getsockopt(..) failed: Connection refused: 
pulsar-broker-0/172.18.0.6:6650
   java.util.concurrent.CompletionException: 
org.apache.pulsar.client.api.PulsarClientException$LookupException: 
org.apache.pulsar.client.api.PulsarClientException: 
java.util.concurrent.CompletionException: 
io.netty.channel.AbstractChannel$AnnotatedConnectException: 
syscall:getsockopt(..) failed: Connection refused: 
pulsar-broker-0/172.18.0.6:6650
        at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 ~[?:1.8.0_191]
        at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 ~[?:1.8.0_191]
        at 
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647) 
~[?:1.8.0_191]
        at 
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
 ~[?:1.8.0_191]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
[?:1.8.0_191]
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 [?:1.8.0_191]
        at 
org.apache.pulsar.client.impl.ClientCnx.handleLookupResponse(ClientCnx.java:380)
 [pulsar-client-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
        at 
org.apache.pulsar.common.api.PulsarDecoder.channelRead(PulsarDecoder.java:118) 
[pulsar-client-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
        at 
org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
 [pulsar-client-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
        at 
org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
 [pulsar-client-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
        at 
org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
 [pulsar-client-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
        at 
   ```
   
   Then the broker realizes it can't get register, so kills itself.
   ```
   12:52:21.754 [main] ERROR 
org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Broker znode 
- [/loadbalance/brokers/pulsar-broker-0:8080] is own by different 
zookeeper-ssession 72259437335805978 
   12:52:21.755 [main] ERROR 
org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Unable to 
create znode - [/loadbalance/brokers/pulsar-broker-0:8080] for load balance on 
zookeeper 
   ```
   
   When the broker comes back the subscription is reset back to the start 
(because we hadn't acked), but the client continues as if its reading from 
where it left off.
   
   @merlimat @sijie @jiazhai 
   So a couple of questions:
   - Do we expect that users of the consumer keep track to ensure message ids 
are monotonically increasing if they want guarantees on the topic?
   - Why can the client read from a broker before the broker has registered?
   
   All info in build: 
https://builds.apache.org/job/pulsar_precommit_integrationtests/5164/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to