devinbost commented on issue #8333:
URL: https://github.com/apache/pulsar/issues/8333#issuecomment-714772055


   One of the Zookeepers was giving these NodeExists exceptions:
   
   ```
       22:09:01.713 [ProcessThread(sid:3 cport:-1):] INFO  
org.apache.zookeeper.server.PrepRequestProcessor - Got user-level 
KeeperException when processing sessionid:0x106ae1d22100053 type:create 
cxid:0x1 zxid:0x1de000dfcbd txntype:-1 reqpath:n/a Error Path:/pulsar/functions 
Error:**KeeperErrorCode = NodeExists for /pulsar/functions**
       22:09:04.339 [ProcessThread(sid:3 cport:-1):] INFO  
org.apache.zookeeper.server.PrepRequestProcessor - Got user-level 
KeeperException when processing sessionid:0x206ae1a6e50005d type:create 
cxid:0x178 zxid:0x1de000e098c txntype:-1 reqpath:n/a Error 
Path:/loadbalance/brokers/server08.domain.com:8080 **Error:KeeperErrorCode = 
NodeExists for /loadbalance/brokers/server08.domain.com:8080**
   ```
    
   Also, none of the Pulsar functions were getting assignments when this 
happened. FYI @jerrypeng ^^^
   
   Also, we were seeing errors like these:
   
   ```  22:09:02.221 [ProcessThread(sid:3 cport:-1):] INFO  
org.apache.zookeeper.server.PrepRequestProcessor - Got user-level 
KeeperException when processing sessionid:0x206ae1a6e500001 type:delete 
cxid:0x9d808 zxid:0x1de000dff81 txntype:-1 reqpath:n/a Error 
Path:/ledgers/underreplication/ledgers/0000/0000/2a0e Error:KeeperErrorCode = 
Directory not empty for /ledgers/underreplication/ledgers/0000/0000/2a0e
       22:09:03.207 [ProcessThread(sid:3 cport:-1):] INFO  
org.apache.zookeeper.server.PrepRequestProcessor - Got user-level 
KeeperException when processing sessionid:0x206ae0c67f00003 type:delete 
cxid:0x190d3e zxid:0x1de000e02c5 txntype:-1 reqpath:n/a Error 
Path:/ledgers/underreplication/ledgers/0000/0001/0990 Error:KeeperErrorCode = 
Directory not empty for /ledgers/underreplication/ledgers/0000/0001/0990
   ```
   
   When the brokers started restarting (after the bookies were unhealthy), we 
also started seeing this repeat in the logs:
   
   ```
   2020-10-19T21:34:08,200 [main] ERROR org.apache.pulsar.PulsarBrokerStarter - 
Failed to start pulsar service.
   org.apache.pulsar.broker.PulsarServerException: 
org.apache.pulsar.broker.PulsarServerException: 
org.apache.pulsar.broker.PulsarServerException: **Broker-znode owned by 
different zk-session** 145995360955662414
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:453) 
   ```
   We also saw that same exception thrown from ModularLoadManagerImpl:
   
   ```2020-10-19T21:13:12,033 [main] ERROR 
org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - **Unable to 
create znode** - [/loadbalance/brokers/server12.domain.com:8080] for load 
balance on zookeeper
   org.apache.pulsar.broker.PulsarServerException: **Broker-znode owned by 
different zk-session** 73937685957574781
        at 
org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl.start(ModularLoadManagerImpl.java:798)
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to