This is an automated email from the ASF dual-hosted git repository.
zhaijia pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git
The following commit(s) were added to refs/heads/master by this push:
new b57c163 Fix stuck lookup operations when the broker is starting up
(#8273)
b57c163 is described below
commit b57c1630e2478755c05a147bfaf11d9a723cd28e
Author: Matteo Merli <[email protected]>
AuthorDate: Fri Oct 16 05:49:02 2020 -0700
Fix stuck lookup operations when the broker is starting up (#8273)
Motivation
When the broker is starting up, it might start getting lookup requests
before all the components of the service are fully initialized. In this
particular case a lookup will fail on NPE because the leader election service
is not ready yet (it gets instantiated after the broker service).
This NPE causes a series of rippling effects:
The future for the request hitting NPE are not completed
They stay stale in the findingBundlesNotAuthoritative cache map forever
All other lookup requests are piggy-backing on the first futures (but these
will not complete)
We reach the max number of pending lookup requests, after which the broker
rejects new lookup
---
.../apache/pulsar/broker/namespace/NamespaceService.java | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git
a/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
b/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
index c00e802..511ea12 100644
---
a/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
+++
b/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
@@ -30,6 +30,7 @@ import org.apache.pulsar.broker.PulsarServerException;
import org.apache.pulsar.broker.PulsarService;
import org.apache.pulsar.broker.ServiceConfiguration;
import org.apache.pulsar.broker.admin.AdminResource;
+import org.apache.pulsar.broker.loadbalance.LeaderElectionService;
import org.apache.pulsar.broker.loadbalance.LoadManager;
import org.apache.pulsar.broker.loadbalance.ResourceUnit;
import org.apache.pulsar.broker.lookup.LookupResult;
@@ -404,7 +405,17 @@ public class NamespaceService {
return;
}
String candidateBroker = null;
- boolean authoritativeRedirect =
pulsar.getLeaderElectionService().isLeader();
+
+ LeaderElectionService les = pulsar.getLeaderElectionService();
+ if (les == null) {
+ // The leader election service was not initialized yet. This can
happen because the broker service is
+ // initialized first and it might start receiving lookup requests
before the leader election service is
+ // fully initialized.
+ lookupFuture.complete(Optional.empty());
+ return;
+ }
+
+ boolean authoritativeRedirect = les.isLeader();
try {
// check if this is Heartbeat or SLAMonitor namespace