Shawyeok opened a new pull request, #25825:
URL: https://github.com/apache/pulsar/pull/25825
Fixes NPE in `IsolatedBookieEnsemblePlacementPolicy`
### Motivation
A `NullPointerException` occurs in `getExcludedBookiesWithIsolationGroups`
when `getIsolationGroup()` returns a `Pair` whose `left` value is `null`. This
happens when the `EnsemblePlacementPolicyConfig` carries a policy class name
that does not equal `IsolatedBookieEnsemblePlacementPolicy` — in that case the
`if` block that calls `pair.setLeft()` / `pair.setRight()` is skipped entirely,
leaving both sides of the `MutablePair` as `null`.
This bug was observed during the upgrade from the 2.8 branch to the 3.0
branch. The same defect may also exist in master.
**Full stack trace from production:**
```
2026-05-18T13:28:40.756Z [BookKeeperClientWorker-OrderedExecutor-35-0] ERROR
org.apache.bookkeeper.common.util.SingleThreadExecutor - Error while running
task: Cannot invoke "java.util.Set.contains(Object)" because the return value
of "org.apache.commons.lang3.tuple.Pair.getLeft()" is null
java.lang.NullPointerException: Cannot invoke
"java.util.Set.contains(Object)" because the return value of
"org.apache.commons.lang3.tuple.Pair.getLeft()" is null
at
org.apache.pulsar.bookie.rackawareness.IsolatedBookieEnsemblePlacementPolicy.getExcludedBookiesWithIsolationGroups(IsolatedBookieEnsemblePlacementPolicy.java:192)
at
org.apache.pulsar.bookie.rackawareness.IsolatedBookieEnsemblePlacementPolicy.getExcludedBookies(IsolatedBookieEnsemblePlacementPolicy.java:141)
at
org.apache.pulsar.bookie.rackawareness.IsolatedBookieEnsemblePlacementPolicy.replaceBookie(IsolatedBookieEnsemblePlacementPolicy.java:127)
at
org.apache.bookkeeper.client.BookieWatcherImpl.replaceBookie(BookieWatcherImpl.java:316)
at
org.apache.bookkeeper.client.EnsembleUtils.replaceBookiesInEnsemble(EnsembleUtils.java:69)
at
org.apache.bookkeeper.client.ReadOnlyLedgerHandle.handleBookieFailure(ReadOnlyLedgerHandle.java:222)
at
org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:353)
at
org.apache.bookkeeper.proto.BookieClientImpl.completeAdd(BookieClientImpl.java:287)
at
org.apache.bookkeeper.proto.BookieClientImpl.access$200(BookieClientImpl.java:79)
at
org.apache.bookkeeper.proto.BookieClientImpl$ChannelReadyForAddEntryCallback.lambda$operationComplete$0(BookieClientImpl.java:405)
at
org.apache.bookkeeper.common.util.OrderedExecutor$TimedRunnable.run(OrderedExecutor.java:203)
at
org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
at
org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:107)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:840)
```
This error causes topics to time out during loading. Since there is no
automatic recovery path, the only way to restore service is by restarting the
broker.
### Modifications
Two complementary fixes:
1. **`getIsolationGroup()`**: Initialize `MutablePair` with empty sets as
defaults so the returned `Pair` always has non-null values, and remove the
now-redundant `else` branches that were explicitly setting the same empty sets.
2. **`getExcludedBookiesWithIsolationGroups()`** (defensive programming):
Resolve `primaryIsolationGroup` and `secondaryIsolationGroup` at the top of the
method using `ObjectUtils.getIfNull`, before any use. This eliminates both the
original NPE at the early-return `contains` check and a second latent NPE at
lines 218–219 where the pair values were previously assigned and dereferenced
without null guards.
### Verifying this change
- [ ] Make sure that the change passes the CI checks.
This change added tests and can be verified as follows:
- Added `testReplaceBookieWithNonMatchingPolicyClassShouldNotThrowNPE` to
reproduce the exact production failure: calling `replaceBookie` with custom
metadata whose `EnsemblePlacementPolicyConfig` policy class does not match
`IsolatedBookieEnsemblePlacementPolicy`, which previously caused the NPE.
### Does this pull request potentially affect one of the following parts:
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]