Krishna Mewara created FINERACT-2449:
----------------------------------------
Summary: Rework Async Threading Model to support Context
Propagation and Pooling
Key: FINERACT-2449
URL: https://issues.apache.org/jira/browse/FINERACT-2449
Project: Apache Fineract
Issue Type: Improvement
Components: Performance
Affects Versions: 1.14.0
Environment: Reproducible on all environments (Local Development,
Docker, and Kubernetes).
Platform:
- OS: Linux / macOS / Windows (OS Agnostic)
- Java Version: 17+ (Standard Fineract Runtime)
- Framework: Spring Boot / Apache Fineract 1.x -> 1.14
Infrastructure:
- Issue is critical in Containerized/Kubernetes environments where thread
exhaustion leads to Pod Eviction/OOMKilled.
- High concurrency scenarios (>200 concurrent users) trigger the thread
accumulation.
Reporter: Krishna Mewara
*Background* The current {{SimpleAsyncTaskExecutor}} creates a new thread for
every task, which can lead to thread exhaustion (accumulating thousands of
threads) under load, as reported in FINERACT-1934.
*The Issue* Simply replacing the executor with a {{ThreadPoolTaskExecutor}} or
applying a concurrency cap to the existing executor causes the application to
crash during startup.
*Technical Constraints* Investigation reveals that the current architecture
relies heavily on {{InheritableThreadLocal}} for security context propagation,
particularly during the boot process (Liquibase execution and initial event
multicasting).
As noted in the source code comments:
{quote}// The application events (for importing) rely on the inheritable thread
local security context strategy // This is NOT compatible with threadpools so
if we use threadpools the below will need to be reworked
{quote}
*Attempts & Failures*
# *ThreadPoolTaskExecutor:* Caused Liquibase context failures because the
security context was not correctly propagated to reused threads.
# *TaskDecorator:* Attempted manual context propagation, but the complexity of
copying all contexts (MDC, Tenancy, Transaction) proved brittle and blocked
startup events.
# *Concurrency Cap:* Limiting the {{SimpleAsyncTaskExecutor}} caused
deadlocks/timeouts during startup because the boot process requires high
parallelism (100+ concurrent threads).
*Proposed Improvement* We need to redesign the threading model to:
# Decouple the startup/bootstrapping phase (which may require unbounded
threads) from the runtime phase.
# Implement a safe mechanism for Context Propagation that is compatible with
pooling (e.g., using {{TransmittableThreadLocal}} or a robust
{{{}TaskDecorator{}}}).
# Migrate to a bounded {{ThreadPoolTaskExecutor}} for runtime tasks to prevent
resource exhaustion.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)