[
https://issues.apache.org/jira/browse/FELIX-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089653#comment-18089653
]
Tom Watson commented on FELIX-6844:
-----------------------------------
I don't typically look at Felix framework code much, but I am concerned that
the event dispatching thread doesn't catch `Throwable` and continue on to the
next listener. Things like `OutOfMemoryError` and `StackOverflowError` should
not cause the thread to die as long as the stack unwinds to nearly the root,
which is what I would assume should happen with the event dispatching thread.
For scenarios that hit this kind of error often it seems like there must be a
bad actor listener installed. Perhaps if any bad listener repeats throwing
Errors over and over then that listener should be put on a block list and not
called any more.
> Implement self-healing recovery mechanism for FelixDispatchQueue thread in
> EventDispatcher
> ------------------------------------------------------------------------------------------
>
> Key: FELIX-6844
> URL: https://issues.apache.org/jira/browse/FELIX-6844
> Project: Felix
> Issue Type: Bug
> Components: Framework
> Reporter: sahvx655-wq
> Priority: Major
>
> The {{EventDispatcher}} uses a single background thread
> ({{{}FelixDispatchQueue{}}}) for asynchronous event delivery. If this thread
> unexpectedly terminates due to runtime errors (e.g.,
> {{{}OutOfMemoryError{}}}, {{{}StackOverflowError{}}}, or unhandled
> exceptions), asynchronous event processing permanently stops, leaving the
> system in a degraded state with no automatic recovery.
> This change introduces a controlled self-healing mechanism to detect a dead
> dispatcher thread and restart it safely.
> If {{FelixDispatchQueue}} is found to be non-alive during
> {{{}fireEventAsynchronously(){}}}, the system triggers a controlled restart
> process. To avoid restart loops, recovery is limited using a maximum retry
> count, cooldown period, and backoff delay. If the retry limit is exceeded,
> auto-recovery is disabled and an error is logged to prevent further resource
> exhaustion.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)