GitHub user ahabel-wob edited a discussion: Namespace bundle ownership during 
kubernetes node upgrades

We run apache pulsar in kubernetes, and in some cases it does not survive the 
node upgrade process.
We are running pulsar on 4 nodes, and the pods are the following 4 brokers, 4 
bookkeepers, 3 zookepers, 3 proxies, 1 bastion, 1 autorecovery. In the logs we 
see this error (for all kinds of topics):

```Failed to create consumer: consumerId=23954, Namespace bundle for topic 
(persistent://app/platform-prod/__transaction_buffer_snapshot-partition-4) not 
served by this 
instance:app-pulsar-prd-broker-1.app-pulsar-prd-broker.apache-pulsar.svc.cluster.local:8080.
 Please redo the lookup. Request is denied: namespace=app/platform-prod```

The error is logged between proxy and the brokers, it fails to do a new lookup. 
The broker can be stuck like this for hours, e.g.: 6 hours. Only a manual 
broker restart resolved the issue.What causes this, and how can we configure 
our pulsar cluster to be more resilient? 

The slack pulsar's AI response is the following:
```The issue you're experiencing seems related to the broker's inability to 
correctly handle namespace bundle ownership during node upgrades or restarts. 
This can lead to lookup failures and prolonged unavailability of certain 
topics. According to the Pulsar documentation on load balancing, topics are 
dynamically assigned to brokers based on load conditions, and issues can arise 
if the ownership of namespace bundles is not correctly managed during broker 
restarts.
To improve resilience, consider the following:
1. Graceful Cluster Rollout: Implement a Graceful Cluster Rollout strategy. 
This involves using StatefulSets with revision hashes to manage broker 
upgrades, ensuring that traffic is gradually migrated from old to new broker 
pods without disrupting service.
2. Load Balancing Configuration: Ensure that your load balancing settings are 
optimized. You can adjust the number of namespace bundles to better distribute 
the load across brokers, as described in the Pulsar load balancing guide.```

GitHub link: https://github.com/apache/pulsar/discussions/23873

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to