paulrutter commented on PR #503:
URL: https://github.com/apache/felix-dev/pull/503#issuecomment-4418239596
# Review: FELIX-6828 whiteboard startup race fix
Single commit, 2 files changed: `WhiteboardContextHandler.java` and
`WhiteboardManager.java`. Builds on the prior `00e2640c37` shutdown-NPE fix
already on master.
## What the patch does (correctly)
**1. `stop()` reordering + lock coverage** (`WhiteboardManager.java:198-227`)
- Closes trackers *before* nulling `webContext`, so any in-flight
`removedService` → `removeContextHelper` callback still sees a valid
`webContext` and `webContext != null` activate-next branch.
- Wraps `webContext = null` and all map clears in
`synchronized(contextMap)`, so no other operation can observe partially-cleared
state. The comment is accurate and useful.
**2. `addContextHelper` null guard** (`WhiteboardManager.java:383-388`)
- Reads `webContext` into a local under the lock and bails with `false` if
null. Prevents constructing a handler with a null `@NotNull` context — the
actual fix for the startup/shutdown race that previously NPE'd deep inside
`activate()` and left things in an unrecoverable state.
**3. TOCTOU-safe `getRegistry()` reads**
All 7 sites that chain `handler.getRegistry().getEventListenerRegistry()...`
or `handler.getRegistry().registerXxx(...)` now read the volatile registry once
into a local and null-check it. Grep confirms no remaining unguarded chains.
The `registry` field is set to null in both
`WhiteboardContextHandler.deactivate()` and on a failed `activate()`, so this
is a real race even within the contextMap-synchronized paths (the volatile read
isn't pinned by that lock — `activate()` itself transiently leaves `registry`
non-null only after a fresh assignment).
**4. Uninstalled-bundle guard in `unregisterWhiteboardService`**
(`WhiteboardManager.java:881-927`)
- `info.getServiceReference().getBundle()` returns null when the registering
bundle has been uninstalled (common at framework shutdown). The old code passed
that directly to `ungetServletContext(@NotNull Bundle)` → NPE. Fix reads once,
null-checks before each `ungetServletContext` call. Correct.
- Also early-returns with a `FAILURE_REASON_SERVLET_CONTEXT_FAILURE` failure
when `reg == null` in `registerWhiteboardService` — paired with
`removeWhiteboardService`'s `failureStateHandler.remove(info, ctxId)` check, so
the un-incremented `perBundleContextMap` counter is not later decremented.
Symmetry checks out.
## Minor observations
- **Dead-code defensive check** in
`WhiteboardContextHandler.activate():90-93`: `webContext` is `final` and the
constructor declares `@NotNull`, plus the only caller path now null-checks
before construction. The new guard is unreachable in practice. It's harmless
belt-and-suspenders, but if you want to keep it, a one-liner comment
("`@NotNull` is not runtime-enforced") would be worth its presence.
- **Not addressed (likely out of scope)**: `addWhiteboardService` for
`PreprocessorInfo` at `WhiteboardManager.java:581-583` still reads
`this.webContext` unsynchronized and unguarded, then constructs a
`PreprocessorHandler` with it. Same race shape as the SCH path, fewer
consequences (no permanent 404), but inconsistent with the rest of the fix.
Worth a follow-up.
- **Style**: matches existing project conventions (spaces inside parens,
`@Nullable` return on `getRegistry()`, imports grouped properly). New `Bundle`
and `PerContextHandlerRegistry` imports are needed and clean.
- **Tests**: no test coverage for `WhiteboardManager` exists in the repo
(`http/base/src/test` has none for whiteboard). Race conditions like this are
notoriously hard to unit-test, but it's worth noting the safety net is absent.
## Verdict
Solid, well-scoped defensive patch. The reordering, lock extension, TOCTOU
pattern, and Bundle null guard are all correct and address real failure modes.
I'd recommend either deleting the unreachable `webContext == null` check in
`WhiteboardContextHandler.activate()` or annotating it, and filing a follow-up
for the preprocessor path. Otherwise: ready.
@royteeuwen can you look at the minor observations? Especially the dead-code
part?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]