snowcrumble opened a new issue #6305: [Proxy]If zk leader was down, proxy was 
not work until restart it
URL: https://github.com/apache/pulsar/issues/6305
 
 
   **Describe the bug**
   My pulsar is running on k8s and our service is using pulsar proxy, I have 
did some failure testing.
   When I force crash the node which serve the zk leader, all my pulsar broker 
crash several times because zk connection problem, they were started final, but 
I must restart pulsar proxy after brokers were availabled, mentioned by #6158 . 
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. Deploy pulsar on k8s
   2. use pulsar-perf tool to connect pulsar proxy and continuous pub/sub on a 
topic
   3. "force" crash a k8s node which has zk leader
   4. The reconnect mechanism doesn't work until restart pulsar proxys.
   
   **Expected behavior**
   Proxy can recover without restart.
   Or broker keep available without restart.
   
   **Screenshots**
   Broker error log
   ```
   04:55:22.078 [main] ERROR org.apache.pulsar.zookeeper.GlobalZooKeeperCache - 
Failed to establish global zookeeper session: null
   java.util.concurrent.TimeoutException: null
        at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) 
~[?:1.8.0_232]
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) 
~[?:1.8.0_232]
        at 
org.apache.pulsar.zookeeper.GlobalZooKeeperCache.start(GlobalZooKeeperCache.java:69)
 [org.apache.pulsar-pulsar-zookeeper-utils-2.4.2.jar:2.4.2]
        at 
org.apache.pulsar.broker.PulsarService.startZkCacheService(PulsarService.java:572)
 [org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:366) 
[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        at 
org.apache.pulsar.PulsarBrokerStarter$BrokerStarter.start(PulsarBrokerStarter.java:273)
 [org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        at 
org.apache.pulsar.PulsarBrokerStarter.main(PulsarBrokerStarter.java:332) 
[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
   04:55:22.082 [main] ERROR org.apache.pulsar.broker.PulsarService - 
java.io.IOException: java.util.concurrent.TimeoutException
   org.apache.pulsar.broker.PulsarServerException: java.io.IOException: 
java.util.concurrent.TimeoutException
        at 
org.apache.pulsar.broker.PulsarService.startZkCacheService(PulsarService.java:574)
 ~[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:366) 
[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        at 
org.apache.pulsar.PulsarBrokerStarter$BrokerStarter.start(PulsarBrokerStarter.java:273)
 [org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        at 
org.apache.pulsar.PulsarBrokerStarter.main(PulsarBrokerStarter.java:332) 
[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
   Caused by: java.io.IOException: java.util.concurrent.TimeoutException
        at 
org.apache.pulsar.zookeeper.GlobalZooKeeperCache.start(GlobalZooKeeperCache.java:76)
 ~[org.apache.pulsar-pulsar-zookeeper-utils-2.4.2.jar:2.4.2]
        at 
org.apache.pulsar.broker.PulsarService.startZkCacheService(PulsarService.java:572)
 ~[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        ... 3 more
   Caused by: java.util.concurrent.TimeoutException
        at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) 
~[?:1.8.0_232]
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) 
~[?:1.8.0_232]
        at 
org.apache.pulsar.zookeeper.GlobalZooKeeperCache.start(GlobalZooKeeperCache.java:69)
 ~[org.apache.pulsar-pulsar-zookeeper-utils-2.4.2.jar:2.4.2]
        at 
org.apache.pulsar.broker.PulsarService.startZkCacheService(PulsarService.java:572)
 ~[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        ... 3 more
   04:55:22.083 [main] ERROR org.apache.pulsar.PulsarBrokerStarter - Failed to 
start pulsar service.
   org.apache.pulsar.broker.PulsarServerException: 
org.apache.pulsar.broker.PulsarServerException: java.io.IOException: 
java.util.concurrent.TimeoutException
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:472) 
~[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        at 
org.apache.pulsar.PulsarBrokerStarter$BrokerStarter.start(PulsarBrokerStarter.java:273)
 ~[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        at 
org.apache.pulsar.PulsarBrokerStarter.main(PulsarBrokerStarter.java:332) 
[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
   Caused by: org.apache.pulsar.broker.PulsarServerException: 
java.io.IOException: java.util.concurrent.TimeoutException
        at 
org.apache.pulsar.broker.PulsarService.startZkCacheService(PulsarService.java:574)
 ~[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:366) 
~[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        ... 2 more
   Caused by: java.io.IOException: java.util.concurrent.TimeoutException
        at 
org.apache.pulsar.zookeeper.GlobalZooKeeperCache.start(GlobalZooKeeperCache.java:76)
 ~[org.apache.pulsar-pulsar-zookeeper-utils-2.4.2.jar:2.4.2]
        at 
org.apache.pulsar.broker.PulsarService.startZkCacheService(PulsarService.java:572)
 ~[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:366) 
~[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        ... 2 more
   Caused by: java.util.concurrent.TimeoutException
        at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) 
~[?:1.8.0_232]
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) 
~[?:1.8.0_232]
        at 
org.apache.pulsar.zookeeper.GlobalZooKeeperCache.start(GlobalZooKeeperCache.java:69)
 ~[org.apache.pulsar-pulsar-zookeeper-utils-2.4.2.jar:2.4.2]
        at 
org.apache.pulsar.broker.PulsarService.startZkCacheService(PulsarService.java:572)
 ~[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:366) 
~[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
        ... 2 more
   04:55:22.084 [pulsar-ordered-OrderedExecutor-3-0-SendThread(zookeeper:2181)] 
WARN  org.apache.zookeeper.ClientCnxn - Client session timed out, have not 
heard from server in 30005ms for sessionid 0x0
   ```
   
   zk-2.log (new leader)
   ```
   04:51:47.727 [QuorumPeer[myid=3]/0.0.0.0:2181] INFO  
org.apache.zookeeper.server.quorum.QuorumPeer - LEADING
   04:51:47.727 [QuorumPeer[myid=3]/0.0.0.0:2181] INFO  
org.apache.zookeeper.server.ZooKeeperServer - Created server with tickTime 2000 
minSessionTimeout 4000 maxSessionTimeout 40000 datadir data/zookeeper/version-2 
snapdir data/zookeeper/version-2
   04:51:47.727 [QuorumPeer[myid=3]/0.0.0.0:2181] INFO  
org.apache.zookeeper.server.quorum.Leader - LEADING - LEADER ELECTION TOOK - 
10421
   04:51:47.734 [LearnerHandler-/10.30.2.141:43982] INFO  
org.apache.zookeeper.server.quorum.LearnerHandler - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@6fe50a68
   ```
   
   perf-tool output
   ```
   06:14:00.049 [pulsar-timer-5-1] INFO  
org.apache.pulsar.client.impl.ConnectionHandler - 
[persistent://public/default/topic-perf] [cn-beijing-105-1] Reconnecting after 
connection was closed
   06:14:00.111 [pulsar-client-io-2-2] INFO  
org.apache.pulsar.client.impl.ConnectionPool - [[id: 0xfa71e541, 
L:/172.17.0.2:47758 - R:/172.16.11.200:6650]] Connected to server
   06:14:00.114 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - 
Throughput produced:      0.0  msg/s ---      0.0 Mbit/s --- failure      0.0 
msg/s --- Latency: mean:   0.000 ms - med:   0.000 - 95pct:   0.000 - 99pct:   
0.000 - 99.9pct:   0.000 - 99.99pct:   0.000 - Max:   0.000
   06:14:00.345 [pulsar-client-io-2-2] ERROR 
org.apache.pulsar.client.impl.ClientCnx - [id: 0xfa71e541, L:/172.17.0.2:47758 
- R:/172.16.11.200:6650] Close connection because received internal-server 
error No active broker is available
   06:14:00.346 [pulsar-client-io-2-2] WARN  
org.apache.pulsar.client.impl.BinaryProtoLookupService - 
[persistent://public/default/topic-perf] failed to send lookup request : 
org.apache.pulsar.client.api.PulsarClientException$LookupException: No active 
broker is available
   06:14:00.347 [pulsar-client-io-2-2] WARN  
org.apache.pulsar.client.impl.ConnectionHandler - 
[persistent://public/default/topic-perf] [cn-beijing-105-1] Error connecting to 
broker: org.apache.pulsar.client.api.PulsarClientException$LookupException: No 
active broker is available
   06:14:00.348 [pulsar-client-io-2-2] WARN  
org.apache.pulsar.client.impl.ConnectionHandler - 
[persistent://public/default/topic-perf] [cn-beijing-105-1] Could not get 
connection to broker: 
org.apache.pulsar.client.api.PulsarClientException$LookupException: No active 
broker is available -- Will try again in 57.763 s
   06:14:00.348 [pulsar-client-io-2-2] INFO  
org.apache.pulsar.client.impl.ClientCnx - [id: 0xfa71e541, L:/172.17.0.2:47758 
! R:/172.16.11.200:6650] Disconnected
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to