Shawyeok opened a new pull request, #25825:
URL: https://github.com/apache/pulsar/pull/25825

   Fixes NPE in `IsolatedBookieEnsemblePlacementPolicy`
   
   ### Motivation
   
   A `NullPointerException` occurs in `getExcludedBookiesWithIsolationGroups` 
when `getIsolationGroup()` returns a `Pair` whose `left` value is `null`. This 
happens when the `EnsemblePlacementPolicyConfig` carries a policy class name 
that does not equal `IsolatedBookieEnsemblePlacementPolicy` — in that case the 
`if` block that calls `pair.setLeft()` / `pair.setRight()` is skipped entirely, 
leaving both sides of the `MutablePair` as `null`.
   
   This bug was observed during the upgrade from the 2.8 branch to the 3.0 
branch. The same defect may also exist in master.
   
   **Full stack trace from production:**
   
   ```
   2026-05-18T13:28:40.756Z [BookKeeperClientWorker-OrderedExecutor-35-0] ERROR 
org.apache.bookkeeper.common.util.SingleThreadExecutor - Error while running 
task: Cannot invoke "java.util.Set.contains(Object)" because the return value 
of "org.apache.commons.lang3.tuple.Pair.getLeft()" is null
   java.lang.NullPointerException: Cannot invoke 
"java.util.Set.contains(Object)" because the return value of 
"org.apache.commons.lang3.tuple.Pair.getLeft()" is null
       at 
org.apache.pulsar.bookie.rackawareness.IsolatedBookieEnsemblePlacementPolicy.getExcludedBookiesWithIsolationGroups(IsolatedBookieEnsemblePlacementPolicy.java:192)
       at 
org.apache.pulsar.bookie.rackawareness.IsolatedBookieEnsemblePlacementPolicy.getExcludedBookies(IsolatedBookieEnsemblePlacementPolicy.java:141)
       at 
org.apache.pulsar.bookie.rackawareness.IsolatedBookieEnsemblePlacementPolicy.replaceBookie(IsolatedBookieEnsemblePlacementPolicy.java:127)
       at 
org.apache.bookkeeper.client.BookieWatcherImpl.replaceBookie(BookieWatcherImpl.java:316)
       at 
org.apache.bookkeeper.client.EnsembleUtils.replaceBookiesInEnsemble(EnsembleUtils.java:69)
       at 
org.apache.bookkeeper.client.ReadOnlyLedgerHandle.handleBookieFailure(ReadOnlyLedgerHandle.java:222)
       at 
org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:353)
       at 
org.apache.bookkeeper.proto.BookieClientImpl.completeAdd(BookieClientImpl.java:287)
       at 
org.apache.bookkeeper.proto.BookieClientImpl.access$200(BookieClientImpl.java:79)
       at 
org.apache.bookkeeper.proto.BookieClientImpl$ChannelReadyForAddEntryCallback.lambda$operationComplete$0(BookieClientImpl.java:405)
       at 
org.apache.bookkeeper.common.util.OrderedExecutor$TimedRunnable.run(OrderedExecutor.java:203)
       at 
org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
       at 
org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:107)
       at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
       at java.base/java.lang.Thread.run(Thread.java:840)
   ```
   
   This error causes topics to time out during loading. Since there is no 
automatic recovery path, the only way to restore service is by restarting the 
broker.
   
   ### Modifications
   
   Two complementary fixes:
   
   1. **`getIsolationGroup()`**: Initialize `MutablePair` with empty sets as 
defaults so the returned `Pair` always has non-null values, and remove the 
now-redundant `else` branches that were explicitly setting the same empty sets.
   
   2. **`getExcludedBookiesWithIsolationGroups()`** (defensive programming): 
Resolve `primaryIsolationGroup` and `secondaryIsolationGroup` at the top of the 
method using `ObjectUtils.getIfNull`, before any use. This eliminates both the 
original NPE at the early-return `contains` check and a second latent NPE at 
lines 218–219 where the pair values were previously assigned and dereferenced 
without null guards.
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   This change added tests and can be verified as follows:
   
   - Added `testReplaceBookieWithNonMatchingPolicyClassShouldNotThrowNPE` to 
reproduce the exact production failure: calling `replaceBookie` with custom 
metadata whose `EnsemblePlacementPolicyConfig` policy class does not match 
`IsolatedBookieEnsemblePlacementPolicy`, which previously caused the NPE.
   
   ### Does this pull request potentially affect one of the following parts:
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The threading model
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] The metrics
   - [ ] Anything that affects deployment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to