Hey Joseph,

Are you using Fineract v1.2 or above? The core jobs are multithreaded in
that version and fetch / commit batch sizes can be configured.

K2, can you confirm if the JMeter scripts you used to test were merged and
available in Fineract v1.2?

We are using a 4vCPU 128GB server to do the load testing. Our target for
> 1.2M savings accounts is less than 1 hour.

We were able to reduce the post interest to savings schedule job run time
> from 15-20 hours to 6 hours with these modifications.

 Applying the same fix on top of Fineract 1.2 might bring it down to less
than 2 hours in that case.

@Ed Cable <[email protected]>

and start a collaborative effort to carry out these performance and load
> testing exercises on a regular basis and a group of contributors focused on
> identifying and fixing the issues to improve performance.


Please find a list of items not specific to interest posting job but more
about transactions in general    :-


Code Change :-

1. OptimisticLockException fix for concurrent transactions affecting the
same account.
2. (Suggested by Nayan) Use Spring Retry to restart the txn for
LockWaitTimeouts where the LockWaits are unavoidable.
3. Remove reliance from txn_running_balance (it anyways reflects wrong
values on doing concurrent txns on same account).
4. Move officially to MySQL 5.7 if it's not official yet. Whether to change
queries to support default only full group by or to change the SQL mode.
5. Archival & Partition Strategy for transactions. (I think this should be
the highest priority item as it will have a positive cascading effect on
almost all core API calls, batch jobs and reports)
6. Selectively change read API redirects to replica DB. Consult DB admin
for the right replication and clustering configuration to make this
feasible based on usage, replication lag, etc.
7. Use Hystrix for short circuit failing integration touchpoints.
8. Separate the modules at a very high level (different discussion).
9. Multi node EhCache has some issues.
10. Traffic shaping / congestion control - control bandwidth and burstiness
at an API level.
11. Ensure most indexes are in read replica store and OLTP db does not have
indexes used by reports.

Configuration Change / Ship Configuration Profiles :-

1. Different profiles for tenant_server_connections thread pool config
2. my.cnf configuration for optimizing DB performance for already available
hardware resources.
3. Tenants DB thread pool size in web Server configuration file
4. Include profiles for clustering - Percona, Aurora, Galera?
5. Include config for reverse proxy - Nginx?
6. Include Tomcat 8 NIO server config file?

Infra Setup :-

1. Infra setup - Establish network latency isn't the case - install mysql
client CLI on your app server, connect to DB server using mysql client and
see how many ms a super simple query takes. Not an infra expert but please
find a common misconfiguration example.
Ex :-
select id from m_code where id = 1;
AWS - Same subnet - 0.3 ms
AWS - Same VPCs - 6ms
AWS - Across VPCs - 300ms

Behaviour Change (Customizing Mifos for use cases Mifos wasn't designed
for) :-

1. Txn API TPS - Review how Fineract features can be used to solve business
use cases. Ex:- Separating one single collection account into different
accounts at merchants / offices / virtual offices / regions level. You can
tag the collection use case with a payment type as well.


Identifying / Reporting Issues :-

1. How to debug / Get relevant logs (bit subjective) - Bit aimed towards
correctly reporting the issue (if not including a test case to reproduce
the issue) :-
1a. Slow queries in write txn APIs standing out in production server logs.
1b. Local application debug (connect to QA env db if reproduction steps are
highly data dependent)
1c. Switch show queries to true to flood the logs with all queries
generated by JPA. Notice repetitive (self references (Ex :- Office entity
eagerly fetching parent and list of children)  and highly propagating
(eager fetches) queries by clearing logs before and after txn.
1d. Profile or put timers at critical lines of DB interaction and subtract
with previous timer value to notice bottleneck queries.
1e. Remote debug two concurrent API calls together. Put a breakpoint at the
start of the relevant service and bring both threads forward one at a time
till one thread reaches lock acquiring query and other thread goes into
wait. See what DB operations are remaining after that to get an idea of how
long that lock will be held. Minimize the amount of time it's holding on to
the lock to minimize the chances of LockWaitTimeout exceptions.
2. Review indexes on core tables. Did someone put an index on a timestamp
column? Did that index addition go through a PR or was it executed directly
on environment? Tighten control to prevent users directly changing DB (DDL
statements at least). Introduce more indexes?


Test Suite Development :-

1. Add more JMeter test scripts and integrate it with a CI cum load testing
server and catch regressions on code change.

Commercial Tools :-

1. APM tool (New Relic, Dynatrace, Nagios, AppDynamics) - easy to do
operational review of queries with highest counts (rather than the longest
running queries in slow query log) - when DB is choking on a low resource
server, all queries will perform much worse than in normal conditions.
2. DB monitoring tool - Percona Monitoring and Management.
3. Nginx Plus or HAProxy


With best regards,
Avik.
ᐧ


On Tue, Oct 22, 2019 at 1:38 AM Ed Cable <[email protected]> wrote:

> Nayan,
>
> Thanks for assisting Joseph with those suggestions and Joseph thanks for
> sharing the initial results of improvements after indexing. I'm happy you
> emailed about this as performance testing and improving performance for
> high scalability environments is a topic of wide importance for many
> members of the current ecosystem and a number of prospective new adopters
> that are looking at the stack.
>
> As evident from your searching and the input you have received thus far on
> the thread what's publicly out there on performance statistics and
> fine-tuning performance is limited. Yet, many of our implementers have run
> Mifos/Fineract in high load environments and have a lot of wisdom and
> experience to share.
>
> @Avik Ganguly <[email protected]> based on your experiences, are you able
> to give some additional suggestions on top of what Nayan has already
> provided.
>
> I will get a separate email thread going to start a collaborative effort
> across the community to get in place a set of reproducible tools to do
> ongoing performance testing on Fineract and Fineract CN, and start a
> collaborative effort to carry out these performance and load testing
> exercises on a regular basis and a group of contributors focused on
> identifying and fixing the issues to improve performance.
>
> We welcome your contributions to those efforts
>
> Ed
>
>
>
> On Mon, Oct 21, 2019 at 7:14 AM Joseph Cabral <[email protected]>
> wrote:
>
>> Hi everyone,
>>
>> I would like to share the initial results of our testing and get your
>> feedback.
>>
>> We modified the code for the post interest to savings scheduler job,
>> specifically the findByStatus method in SavingsRepositoryWrapper where it
>> queries the active savings accounts. We noticed that it the current design
>> depended on lazy load fetching to populate the transactions and charges
>> list of each savings accounts. From previous experience with  other systems
>> this has been a cause of various slow downs so we focused on modifying this
>> part. We decided to query the savings transaction and charges in bulk to
>> reduce the number of database calls. See below for our implementation.
>>
>> We also removed the CascadeType.ALL and FetchType.Lazy settings for the
>> transactions and charges list of the SavingsAccount entity as we are
>> already manually fetching their contents. We will do further testing as
>> this may have an impact in other modules.
>>
>> @Transactional(readOnly=true)
>>     public Page<SavingsAccount> findByStatus(Integer status, Pageable 
>> pageable) {
>>         logger.info("findByStatus - Start querying savings account");
>>         Page<SavingsAccount> accounts = this.repository.findByStatus(status, 
>> pageable);
>>         List<Long> idList = new ArrayList<Long>();
>>         Map<Long, SavingsAccount> accountsMap = new HashMap<Long, 
>> SavingsAccount>();
>>         if(accounts != null) {
>>             for(SavingsAccount account : accounts) {
>>                 account.setCharges(new HashSet<>());
>>                 account.setTransactions(new ArrayList<>());
>>
>>                 idList.add(account.getId());
>>                 accountsMap.put(account.getId(), account);
>>             }
>>             List<SavingsAccountTransaction> savingsAccountTransactionList = 
>> savingsAccountTransactionRepository.findBySavingsAccountIdList(idList);
>>             if(savingsAccountTransactionList != null) {
>>                 for(SavingsAccountTransaction transaction : 
>> savingsAccountTransactionList) {
>>                     SavingsAccount account = 
>> accountsMap.get(transaction.getSavingsAccount().getId());
>>                     account.getTransactions().add(transaction);
>>                 }
>>             }
>>
>>             Set<SavingsAccountCharge> savingsAccountChargeList = 
>> savingsAccountChargeRepository.findBySavingsAccountIdList(idList);
>>             if(savingsAccountChargeList != null) {
>>                 for(SavingsAccountCharge charges : savingsAccountChargeList) 
>> {
>>                     SavingsAccount account = 
>> accountsMap.get(charges.savingsAccount().getId());
>>                     account.getCharges().add(charges);
>>                 }
>>             }
>>         }
>>         logger.info("findByStatus - Finished querying savings account");//   
>>      loadLazyCollections(accounts);
>>         return accounts;
>>     }
>>
>>
>> We were able to reduce the post interest to savings schedule job run time
>> from 15-20 hours to 6 hours with these modifications. After this we will
>> look at how to reduce the run time for the saving/updating part.
>>
>> I would like to ask if anyone has any alternative solutions or if you
>> have any feedback on how we implemented it?
>>
>> Regards,
>>
>> Joseph
>>
>> On Sun, Oct 20, 2019 at 11:02 PM Joseph Cabral <[email protected]>
>> wrote:
>>
>>> Hi Michael,
>>>
>>> Of course, we will give feedback if we are able to make improvements to
>>> our scheduler job run time.
>>>
>>> But if anyone else has any experience in load testing or running
>>> Fineract in high load environments I am open to suggestions.
>>>
>>> Thanks!
>>>
>>> Joseph
>>>
>>> On Sun, Oct 20, 2019 at 7:32 PM Michael Vorburger <[email protected]>
>>> wrote:
>>>
>>>> Joseph,
>>>>
>>>> On Sun, 20 Oct 2019, 00:28 Joseph Cabral, <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Nayan,
>>>>>
>>>>> Thank you for the tips! I tried asking for advice here first because
>>>>> we are wary of doing any code change since we were under the assumption
>>>>> that Fineract had already been used in many high load situations in
>>>>> production and we have just set it up wrong or there are some settings we
>>>>> could change.
>>>>>
>>>>> We will first try adding some indexes to the database but it looks
>>>>> like we have to do some code change for this.
>>>>>
>>>>
>>>> Will you be contributing any performance related improvements you make
>>>> back to the community?
>>>>
>>>> Thanks again!
>>>>>
>>>>> Joseph
>>>>>
>>>>> On Sun, Oct 20, 2019 at 2:21 AM Nayan Ambali <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Joseph,
>>>>>>
>>>>>> Previously I had done Mifos platform load testing for the community,
>>>>>> based on my experience below are my recommendations
>>>>>>
>>>>>> *without code change*
>>>>>> 1. Use SSD/High IOPS storage for Database
>>>>>> 2. If you are on AWS, go for Aurora instead of MySQL
>>>>>> 3. Look at database usage for this batch job and see if there is
>>>>>> opportunity index some columns for better performance
>>>>>>
>>>>>> *with code change*
>>>>>> 1. Process the data parallel either on with multi threading on single
>>>>>> node or we can go for multiple nodes
>>>>>> 2. Query optimisation
>>>>>> 3. Data fetch and commit batch size
>>>>>> there are many other opportunities to improve the same
>>>>>>
>>>>>> -
>>>>>> at your service
>>>>>>
>>>>>> Nayan Ambali
>>>>>> +91 9591996042
>>>>>> skype: nayangambali
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 19, 2019 at 2:20 PM Joseph Cabral <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Everyone,
>>>>>>>
>>>>>>> I would like to ask if anyone has done load testing of the
>>>>>>> MifosX/Fineract scheduler jobs? Specifically the post interest to 
>>>>>>> savings
>>>>>>> job.
>>>>>>>
>>>>>>> We created a new Savings Product (ADB Monthly - 10% Interest) with
>>>>>>> the following settings:
>>>>>>> Nominal interest rate (annual): 10%
>>>>>>> Balance required for interest calculation: $1,000.00
>>>>>>> Interest compounding period: Monthly
>>>>>>> Interest posting period: Monthly
>>>>>>> Interested calculated using: Average Daily Balance
>>>>>>>
>>>>>>> We populated the m_savings_account table using the savings product
>>>>>>> above with 1.2M new savings accounts which we then deposited an initial
>>>>>>> balance of $10,000 each. We then edited the post interest to savings 
>>>>>>> job to
>>>>>>> post the interest even though it is not yet the end of the month.
>>>>>>>
>>>>>>> On consecutive tests, we averaged around 15 to 20 hours to complete
>>>>>>> the job.
>>>>>>>
>>>>>>> We are using a 4vCPU 128GB server to do the load testing. We
>>>>>>> deployed MifosX/Fineract in a docker container with Tomcat 7. The MySQL
>>>>>>> database is deployed in a separate docker container on the same machine.
>>>>>>>
>>>>>>> Any tips or ideas on what we can do to improve the run time of the
>>>>>>> job? Our target for 1.2M savings accounts is less than 1 hour.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Joseph
>>>>>>>
>>>>>>
>
> --
> *Ed Cable*
> President/CEO, Mifos Initiative
> [email protected] | Skype: edcable | Mobile: +1.484.477.8649
>
> *Collectively Creating a World of 3 Billion Maries | *http://mifos.org
> <http://facebook.com/mifos>  <http://www.twitter.com/mifos>
>
>

Reply via email to