Hi Jakub, During recent discussions, several areas for improvement in Fineract were mentioned. I’d like to highlight two recommendations in particular that I believe are excellent candidates for your performance optimization initiative:
1. Job Performance Optimization (Originally suggested by Arnold) “One of the things Fineract clearly struggles with is the performance of jobs. These are mostly single-threaded implementations that begin to suffer even under moderate data volumes. Many could be rewritten or enhanced to run as Spring Batch partitioned and chunked jobs. While Fineract does include one scaled job (Loan COB), the rest are implemented as Spring Batch tasklets, which are either single-threaded or only parallelized within the tasklet itself. Neither approach is well-suited to handling large-scale datasets.” Fineract’s job system plays a critical role in core functions like interest calculation, accrual recognition, event dispatching, report generation, and dividend payouts. As Arnold noted, the current implementations are suboptimal in terms of performance. Redesigning and rewriting these jobs with scalability in mind would be a highly valuable contribution to the project — one with clear and measurable impact. 2. JPA Usage and Entity Fetching Patterns This is another area with significant room for improvement. Most database interactions in Fineract go through JPA. For instance, submitting a new loan involves creating a Loan entity, setting its fields, and persisting it via the entity manager. When performing operations on existing loans, Fineract often loads the Loan entity along with many associated entities — far more than typically necessary. Example: Making a Loan Repayment When fetching a loan, the following associated data may also be loaded: Client info → Office, Image, Staff Group info → Office, Staff, Members, Group level Group Loan Individual Monitoring Account Fund info Loan Officer info → Staff Interest Recalculation, Top-Up Details Despite many of these associations being marked as LAZY, the method LoanAssemblerImpl#assembleFrom(Long) contains logic that explicitly fetches extensive related data, including: Loan charges → charge details, tax group, payment type, etc. Tranche charges Repayment installments Transactions → office, charge mappings, etc. Disbursement details Term variations Collaterals and related management Loan officer assignment history As you can see, a large amount of data is fetched — much of which is not necessary for a simple repayment operation. For example, top-up details, disbursement info, or officer assignment history are likely irrelevant in this context. This is just one use case. I strongly believe that by carefully reviewing each operation and selectively loading only the necessary data, we can significantly improve performance and reduce infrastructure overhead. If tackling every case individually feels too complex, a good starting point could be: Removing some of the unnecessary associations from the Loan entity Minimizing eager loading Fetching related data only when explicitly needed Example: When creating a new loan transaction, there's no need to fetch all loan transactions — exceptions may apply. 3. Additional Note: Primary Key Generation Strategy One issue that hasn’t been discussed yet is the sub-optimal strategy used for primary key generation. Fineract currently supports three database engines: MySQL (original) MariaDB (added later) PostgreSQL (added more recently) Because MySQL and MariaDB do not support sequences, Fineract relies on identity columns for PK generation. This means: You must flush the persistence context to retrieve the generated ID. As a result, multiple flushes occur during transactions — especially when IDs are needed immediately (e.g., for external events or JDBC queries). Currently, PK fields are Long and auto-generated. My recommendation: Switch from Long to String or UUID Stop relying on database-generated IDs Generate IDs within Fineract itself to avoid unnecessary flushes and improve consistency across database engines Whether we use UUIDs, NanoIDs, or other formats (e.g., VARCHAR(22)), is a topic for broader discussion — perhaps via the mailing list. But moving away from auto-generated, database-dependent, and easily guessable IDs would be a step forward for both performance and architecture. I hope this provides some helpful context and direction! Best regards, Adam