Krishna Mewara created FINERACT-2449:
----------------------------------------

             Summary: Rework Async Threading Model to support Context 
Propagation and Pooling
                 Key: FINERACT-2449
                 URL: https://issues.apache.org/jira/browse/FINERACT-2449
             Project: Apache Fineract
          Issue Type: Improvement
          Components: Performance
    Affects Versions: 1.14.0
         Environment: Reproducible on all environments (Local Development, 
Docker, and Kubernetes).

Platform:
- OS: Linux / macOS / Windows (OS Agnostic)
- Java Version: 17+ (Standard Fineract Runtime)
- Framework: Spring Boot / Apache Fineract 1.x -> 1.14

Infrastructure:
- Issue is critical in Containerized/Kubernetes environments where thread 
exhaustion leads to Pod Eviction/OOMKilled.
- High concurrency scenarios (>200 concurrent users) trigger the thread 
accumulation.
            Reporter: Krishna Mewara


*Background* The current {{SimpleAsyncTaskExecutor}} creates a new thread for 
every task, which can lead to thread exhaustion (accumulating thousands of 
threads) under load, as reported in FINERACT-1934.

*The Issue* Simply replacing the executor with a {{ThreadPoolTaskExecutor}} or 
applying a concurrency cap to the existing executor causes the application to 
crash during startup.

*Technical Constraints* Investigation reveals that the current architecture 
relies heavily on {{InheritableThreadLocal}} for security context propagation, 
particularly during the boot process (Liquibase execution and initial event 
multicasting).

As noted in the source code comments:
{quote}// The application events (for importing) rely on the inheritable thread 
local security context strategy // This is NOT compatible with threadpools so 
if we use threadpools the below will need to be reworked
{quote}
*Attempts & Failures*
 # *ThreadPoolTaskExecutor:* Caused Liquibase context failures because the 
security context was not correctly propagated to reused threads.

 # *TaskDecorator:* Attempted manual context propagation, but the complexity of 
copying all contexts (MDC, Tenancy, Transaction) proved brittle and blocked 
startup events.

 # *Concurrency Cap:* Limiting the {{SimpleAsyncTaskExecutor}} caused 
deadlocks/timeouts during startup because the boot process requires high 
parallelism (100+ concurrent threads).

*Proposed Improvement* We need to redesign the threading model to:
 # Decouple the startup/bootstrapping phase (which may require unbounded 
threads) from the runtime phase.

 # Implement a safe mechanism for Context Propagation that is compatible with 
pooling (e.g., using {{TransmittableThreadLocal}} or a robust 
{{{}TaskDecorator{}}}).

 # Migrate to a bounded {{ThreadPoolTaskExecutor}} for runtime tasks to prevent 
resource exhaustion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to