[ 
https://issues.apache.org/jira/browse/FELIX-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089653#comment-18089653
 ] 

Tom Watson commented on FELIX-6844:
-----------------------------------

I don't typically look at Felix framework code much, but I am concerned that 
the event dispatching thread doesn't catch `Throwable`  and continue on to the 
next listener.  Things like `OutOfMemoryError` and `StackOverflowError` should 
not cause the thread to die as long as the stack unwinds to nearly the root, 
which is what I would assume should happen with the event dispatching thread.

For scenarios that hit this kind of error often it seems like there must be a 
bad actor listener installed.  Perhaps if any bad listener repeats throwing 
Errors over and over then that listener should be put on a block list and not 
called any more. 

> Implement self-healing recovery mechanism for FelixDispatchQueue thread in 
> EventDispatcher
> ------------------------------------------------------------------------------------------
>
>                 Key: FELIX-6844
>                 URL: https://issues.apache.org/jira/browse/FELIX-6844
>             Project: Felix
>          Issue Type: Bug
>          Components: Framework
>            Reporter: sahvx655-wq
>            Priority: Major
>
> The {{EventDispatcher}} uses a single background thread 
> ({{{}FelixDispatchQueue{}}}) for asynchronous event delivery. If this thread 
> unexpectedly terminates due to runtime errors (e.g., 
> {{{}OutOfMemoryError{}}}, {{{}StackOverflowError{}}}, or unhandled 
> exceptions), asynchronous event processing permanently stops, leaving the 
> system in a degraded state with no automatic recovery.
> This change introduces a controlled self-healing mechanism to detect a dead 
> dispatcher thread and restart it safely.
> If {{FelixDispatchQueue}} is found to be non-alive during 
> {{{}fireEventAsynchronously(){}}}, the system triggers a controlled restart 
> process. To avoid restart loops, recovery is limited using a maximum retry 
> count, cooldown period, and backoff delay. If the retry limit is exceeded, 
> auto-recovery is disabled and an error is logged to prevent further resource 
> exhaustion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to