Roy Teeuwen created FELIX-6828:
----------------------------------

             Summary: Whiteboard startup race produces permanent 404s
                 Key: FELIX-6828
                 URL: https://issues.apache.org/jira/browse/FELIX-6828
             Project: Felix
          Issue Type: Bug
          Components: HTTP Service
    Affects Versions: http.jetty12-1.1.8, http.base-5.1.16
            Reporter: Roy Teeuwen


*Summary*

Under concurrent `org.apache.felix.http` `ConfigurationAdmin` updates and
shadowed `ServletContextHelper` churn, `WhiteboardManager` gets stuck in a
state where every URL returns 404. Servlet registrations made afterwards
log success but Jetty's URL routing never picks them up. The corruption is
permanent for the JVM lifetime; only restarting the Felix HTTP bundle (or
the JVM) recovers.

*When it happens*

Triggered by an OSGi container that combines both:

1. **A stop/start cycle of the HTTP stack mid-traffic.** Any
   `ConfigurationAdmin` update for PID `org.apache.felix.http` causes
   `JettyService.updated()` to call `stopJetty()` then `startJetty()`, which
   tears down and rebuilds `WhiteboardManager`. Sling Starter 14 hits this
   on package install: `org.apache.sling.installer.factory.packages`
   triggers a bundle refresh, which restarts the Felix HTTP bundle and
   re-delivers the persisted config.

2. **Concurrent `ServletContextHelper` registration / unregistration on
   another thread.** In the bundle-refresh wave, several Sling bundles
   (Sling Engine, `SlingHttpContext`, etc.) register/unregister their SCHs
   while the HTTP stack is stopping/starting. 

*Root cause*

`WhiteboardManager.stop()` nulls `this.webContext` **before** it closes its
service trackers. While the trackers are still open, a concurrent
`registerService(ServletContextHelper)` from another thread synchronously
fires `addingService` on the open tracker, which calls
`addContextHelper(...)` and reads the now-`null` `webContext` into a new
`WhiteboardContextHandler`. The handler's `activate(...)` then tries to
build a `SharedServletContextImpl(webContext, ...)`, whose constructor
unconditionally calls `webContext.getContextPath()` and NPEs.

The same teardown cascade exposes a second race in
`WhiteboardManager.deactivate()`: a TOCTOU between the existing
`if (handler.getRegistry() != null)` check and the subsequent
`handler.getRegistry().getEventListenerRegistry()...` call. Plus a number
of unguarded `handler.getRegistry()` chains in 
`register/unregisterWhiteboardService`,
`addWhiteboardService`, `removeWhiteboardService` and `sessionIdChanged`
that NPE if the registry has just been nulled.

*Logs*

*ERROR* org.apache.felix.http: Exception during controller unregister
java.lang.NullPointerException: Cannot invoke
  "PerContextHandlerRegistry.getEventListenerRegistry()" because
  the return value of "WhiteboardContextHandler.getRegistry()" is null
    at WhiteboardManager.deactivate(WhiteboardManager.java:340)
    at WhiteboardManager.removeContextHelper(WhiteboardManager.java:462)
    at ServletContextHelperTracker.removed(ServletContextHelperTracker.java:106)
    ...
    at WhiteboardManager.stop(WhiteboardManager.java:202)
    at HttpServiceController.unregister(HttpServiceController.java:158)
    at JettyService.stopJetty(JettyService.java:230)
    at JettyService.updated(JettyService.java:206)
    at JettyManagedService.updated(JettyManagedService.java:38)
    at 
ConfigurationManager$UpdateConfiguration.run(ConfigurationManager.java:1418)

java.lang.NullPointerException: Cannot invoke
  "jakarta.servlet.ServletContext.getContextPath()" because
  "webContext" is null
    at SharedServletContextImpl.<init>(SharedServletContextImpl.java:86)
    at WhiteboardContextHandler.activate(WhiteboardContextHandler.java:94)
    at WhiteboardManager.activate(WhiteboardManager.java:253)
    at WhiteboardManager.addContextHelper(WhiteboardManager.java:369)
    at ServletContextHelperTracker.addingService(...)





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to