[ 
https://issues.apache.org/jira/browse/FINERACT-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Mewara updated FINERACT-2449:
-------------------------------------
    Description: 
*Background* The current {{SimpleAsyncTaskExecutor}} creates a new thread for 
every task. While effective for light loads, this unbounded behavior poses a 
theoretical risk of thread exhaustion (OOM) under specific high-concurrency 
scenarios (as originally reported in FINERACT-1934).

*Motivation for Change* Although the specific "thread explosion" from 
FINERACT-1934 is difficult to reproduce in standard local/CI environments, 
relying on an unbounded executor is contrary to Spring Boot best practices for 
production-grade financial systems.

*Goal* Proactively replace the unbounded {{SimpleAsyncTaskExecutor}} with a 
bounded, configurable {{{}ThreadPoolTaskExecutor{}}}. This ensures 
deterministic resource usage and prevents any future possibility of thread 
leaks, regardless of load.

*Proposed Solution*
 # Replace {{SimpleAsyncTaskExecutor}} with {{{}ThreadPoolTaskExecutor{}}}.

 # Configure safe defaults (Core: CPU{_}2, Max: CPU{_}5).

 # Implement robust unit tests (using {{CountDownLatch}} patterns) to 
scientifically prove that the new pool respects bounds while maintaining 
parallelism.

 # Ensure compatibility with all Fineract Modes (Read/Write/Batch).

  was:
*Background* The current {{SimpleAsyncTaskExecutor}} creates a new thread for 
every task, which can lead to thread exhaustion (accumulating thousands of 
threads) under load, as reported in FINERACT-1934.

*The Issue* Simply replacing the executor with a {{ThreadPoolTaskExecutor}} or 
applying a concurrency cap to the existing executor causes the application to 
crash during startup.

*Technical Constraints* Investigation reveals that the current architecture 
relies heavily on {{InheritableThreadLocal}} for security context propagation, 
particularly during the boot process (Liquibase execution and initial event 
multicasting).

As noted in the source code comments:
{quote}// The application events (for importing) rely on the inheritable thread 
local security context strategy // This is NOT compatible with threadpools so 
if we use threadpools the below will need to be reworked
{quote}
*Attempts & Failures*
 # *ThreadPoolTaskExecutor:* Caused Liquibase context failures because the 
security context was not correctly propagated to reused threads.

 # *TaskDecorator:* Attempted manual context propagation, but the complexity of 
copying all contexts (MDC, Tenancy, Transaction) proved brittle and blocked 
startup events.

 # *Concurrency Cap:* Limiting the {{SimpleAsyncTaskExecutor}} caused 
deadlocks/timeouts during startup because the boot process requires high 
parallelism (100+ concurrent threads).

*Proposed Improvement* We need to redesign the threading model to:
 # Decouple the startup/bootstrapping phase (which may require unbounded 
threads) from the runtime phase.

 # Implement a safe mechanism for Context Propagation that is compatible with 
pooling (e.g., using {{TransmittableThreadLocal}} or a robust 
{{{}TaskDecorator{}}}).

 # Migrate to a bounded {{ThreadPoolTaskExecutor}} for runtime tasks to prevent 
resource exhaustion.

        Summary: Replace unbounded SimpleAsyncTaskExecutor with bounded 
ThreadPoolTaskExecutor  (was: Rework Async Threading Model to support Context 
Propagation and Pooling)

> Replace unbounded SimpleAsyncTaskExecutor with bounded ThreadPoolTaskExecutor
> -----------------------------------------------------------------------------
>
>                 Key: FINERACT-2449
>                 URL: https://issues.apache.org/jira/browse/FINERACT-2449
>             Project: Apache Fineract
>          Issue Type: Improvement
>          Components: Performance
>    Affects Versions: 1.14.0
>         Environment: Reproducible on all environments (Local Development, 
> Docker, and Kubernetes).
> Platform:
> - OS: Linux / macOS / Windows (OS Agnostic)
> - Java Version: 17+ (Standard Fineract Runtime)
> - Framework: Spring Boot / Apache Fineract 1.x -> 1.14
> Infrastructure:
> - Issue is critical in Containerized/Kubernetes environments where thread 
> exhaustion leads to Pod Eviction/OOMKilled.
> - High concurrency scenarios (>200 concurrent users) trigger the thread 
> accumulation.
>            Reporter: Krishna Mewara
>            Priority: Major
>              Labels: improvement, performance, technical-debt, threading
>
> *Background* The current {{SimpleAsyncTaskExecutor}} creates a new thread for 
> every task. While effective for light loads, this unbounded behavior poses a 
> theoretical risk of thread exhaustion (OOM) under specific high-concurrency 
> scenarios (as originally reported in FINERACT-1934).
> *Motivation for Change* Although the specific "thread explosion" from 
> FINERACT-1934 is difficult to reproduce in standard local/CI environments, 
> relying on an unbounded executor is contrary to Spring Boot best practices 
> for production-grade financial systems.
> *Goal* Proactively replace the unbounded {{SimpleAsyncTaskExecutor}} with a 
> bounded, configurable {{{}ThreadPoolTaskExecutor{}}}. This ensures 
> deterministic resource usage and prevents any future possibility of thread 
> leaks, regardless of load.
> *Proposed Solution*
>  # Replace {{SimpleAsyncTaskExecutor}} with {{{}ThreadPoolTaskExecutor{}}}.
>  # Configure safe defaults (Core: CPU{_}2, Max: CPU{_}5).
>  # Implement robust unit tests (using {{CountDownLatch}} patterns) to 
> scientifically prove that the new pool respects bounds while maintaining 
> parallelism.
>  # Ensure compatibility with all Fineract Modes (Read/Write/Batch).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to