Hi Jakub,
During recent discussions, several areas for improvement in Fineract were 
mentioned. I’d like to highlight two recommendations in particular that I 
believe are excellent candidates for your performance optimization initiative:

1. Job Performance Optimization (Originally suggested by Arnold)

“One of the things Fineract clearly struggles with is the performance of jobs. 
These are mostly single-threaded implementations that begin to suffer even 
under moderate data volumes.
Many could be rewritten or enhanced to run as Spring Batch partitioned and 
chunked jobs.
While Fineract does include one scaled job (Loan COB), the rest are implemented 
as Spring Batch tasklets, which are either single-threaded or only parallelized 
within the tasklet itself. Neither approach is well-suited to handling 
large-scale datasets.”

Fineract’s job system plays a critical role in core functions like interest 
calculation, accrual recognition, event dispatching, report generation, and 
dividend payouts. As Arnold noted, the current implementations are suboptimal 
in terms of performance. 

Redesigning and rewriting these jobs with scalability in mind would be a highly 
valuable contribution to the project — one with clear and measurable impact.

2. JPA Usage and Entity Fetching Patterns

This is another area with significant room for improvement. Most database 
interactions in Fineract go through JPA. For instance, submitting a new loan 
involves creating a Loan entity, setting its fields, and persisting it via the 
entity manager.

When performing operations on existing loans, Fineract often loads the Loan 
entity along with many associated entities — far more than typically necessary.

Example: Making a Loan Repayment

When fetching a loan, the following associated data may also be loaded:

Client info → Office, Image, Staff

Group info → Office, Staff, Members, Group level

Group Loan Individual Monitoring Account

Fund info

Loan Officer info → Staff

Interest Recalculation, Top-Up Details

Despite many of these associations being marked as LAZY, the method
LoanAssemblerImpl#assembleFrom(Long) contains logic that explicitly fetches 
extensive related data, including:

Loan charges → charge details, tax group, payment type, etc.

Tranche charges

Repayment installments

Transactions → office, charge mappings, etc.

Disbursement details

Term variations

Collaterals and related management

Loan officer assignment history

As you can see, a large amount of data is fetched — much of which is not 
necessary for a simple repayment operation. For example, top-up details, 
disbursement info, or officer assignment history are likely irrelevant in this 
context.

This is just one use case. I strongly believe that by carefully reviewing each 
operation and selectively loading only the necessary data, we can significantly 
improve performance and reduce infrastructure overhead.

If tackling every case individually feels too complex, a good starting point 
could be:

Removing some of the unnecessary associations from the Loan entity

Minimizing eager loading

Fetching related data only when explicitly needed

Example:
When creating a new loan transaction, there's no need to fetch all loan 
transactions — exceptions may apply.

3. Additional Note: Primary Key Generation Strategy

One issue that hasn’t been discussed yet is the sub-optimal strategy used for 
primary key generation.

Fineract currently supports three database engines:

MySQL (original)

MariaDB (added later)

PostgreSQL (added more recently)

Because MySQL and MariaDB do not support sequences, Fineract relies on identity 
columns for PK generation. This means:

You must flush the persistence context to retrieve the generated ID.

As a result, multiple flushes occur during transactions — especially when IDs 
are needed immediately (e.g., for external events or JDBC queries).

Currently, PK fields are Long and auto-generated.

My recommendation:

Switch from Long to String or UUID

Stop relying on database-generated IDs

Generate IDs within Fineract itself to avoid unnecessary flushes and improve 
consistency across database engines

Whether we use UUIDs, NanoIDs, or other formats (e.g., VARCHAR(22)), is a topic 
for broader discussion — perhaps via the mailing list. But moving away from 
auto-generated, database-dependent, and easily guessable IDs would be a step 
forward for both performance and architecture.

I hope this provides some helpful context and direction!

Best regards,
Adam

Reply via email to