devinbost commented on issue #8333:
URL: https://github.com/apache/pulsar/issues/8333#issuecomment-714772055
One of the Zookeepers was giving these NodeExists exceptions:
```
22:09:01.713 [ProcessThread(sid:3 cport:-1):] INFO
org.apache.zookeeper.server.PrepRequestProcessor - Got user-level
KeeperException when processing sessionid:0x106ae1d22100053 type:create
cxid:0x1 zxid:0x1de000dfcbd txntype:-1 reqpath:n/a Error Path:/pulsar/functions
Error:**KeeperErrorCode = NodeExists for /pulsar/functions**
22:09:04.339 [ProcessThread(sid:3 cport:-1):] INFO
org.apache.zookeeper.server.PrepRequestProcessor - Got user-level
KeeperException when processing sessionid:0x206ae1a6e50005d type:create
cxid:0x178 zxid:0x1de000e098c txntype:-1 reqpath:n/a Error
Path:/loadbalance/brokers/server08.domain.com:8080 **Error:KeeperErrorCode =
NodeExists for /loadbalance/brokers/server08.domain.com:8080**
```
Also, none of the Pulsar functions were getting assignments when this
happened. FYI @jerrypeng ^^^
Also, we were seeing errors like these:
``` 22:09:02.221 [ProcessThread(sid:3 cport:-1):] INFO
org.apache.zookeeper.server.PrepRequestProcessor - Got user-level
KeeperException when processing sessionid:0x206ae1a6e500001 type:delete
cxid:0x9d808 zxid:0x1de000dff81 txntype:-1 reqpath:n/a Error
Path:/ledgers/underreplication/ledgers/0000/0000/2a0e Error:KeeperErrorCode =
Directory not empty for /ledgers/underreplication/ledgers/0000/0000/2a0e
22:09:03.207 [ProcessThread(sid:3 cport:-1):] INFO
org.apache.zookeeper.server.PrepRequestProcessor - Got user-level
KeeperException when processing sessionid:0x206ae0c67f00003 type:delete
cxid:0x190d3e zxid:0x1de000e02c5 txntype:-1 reqpath:n/a Error
Path:/ledgers/underreplication/ledgers/0000/0001/0990 Error:KeeperErrorCode =
Directory not empty for /ledgers/underreplication/ledgers/0000/0001/0990
```
When the brokers started restarting (after the bookies were unhealthy), we
also started seeing this repeat in the logs:
```
2020-10-19T21:34:08,200 [main] ERROR org.apache.pulsar.PulsarBrokerStarter -
Failed to start pulsar service.
org.apache.pulsar.broker.PulsarServerException:
org.apache.pulsar.broker.PulsarServerException:
org.apache.pulsar.broker.PulsarServerException: **Broker-znode owned by
different zk-session** 145995360955662414
at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:453)
```
We also saw that same exception thrown from ModularLoadManagerImpl:
```2020-10-19T21:13:12,033 [main] ERROR
org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - **Unable to
create znode** - [/loadbalance/brokers/server12.domain.com:8080] for load
balance on zookeeper
org.apache.pulsar.broker.PulsarServerException: **Broker-znode owned by
different zk-session** 73937685957574781
at
org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl.start(ModularLoadManagerImpl.java:798)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]