Re: [Mifos-users] Load testing of MifosX/Fineract scheduler jobs

Joseph Cabral Mon, 16 Dec 2019 19:04:45 -0800

Hi Fineract community,

Would just like to get you inputs on the business logic of post interest to
savings batch.


Thank you!

Joseph

On Wed, Dec 4, 2019 at 10:49 PM Joseph Cabral <joseph.cabra...@gmail.com>
wrote:

> Hi Fineract community,
>
> Sorry for the very late reply. We have been trying different things to
> improve the batch run time for post interest to savings job but to no avail.
>
> We have tried various things like upgrading Spring Boot, migrating to
> Hibernate, using SpringJPA instead of OpenJPA so we can use Java Stream or
> ScrollableResultSet but it does not seem to have any effect on the
> performance. We also tried modifying the batch_size feature of Hibernate to
> use its batch insert and update. Some of the techniques we tried even
> worsened the performance as compared to before.
>
> One thing that worked for us is the use of the official MySQL driver
> instead of Drizzle. This improved performance by around 20%. However we are
> still not near our target of 1 to 2 hours.
>
> Hi Nayan, Avik,
>
> Noted and thank you very much for your recommendations. We will try them
> out.
>
> Regarding your comment on the Fineract version, we are currently testing
> on the develop branch so we should have the latest changes of v1.2 to v1.4.
>
> How do we test multithreading? How do we set this up or configure this?
>
> We have some questions on the business logic for the post interest to
> savings batch job.
>
>    1. The SavingsAccount postInterest() method iterates a lot through the
>    SavingsAccountTransaction and PostingPeriod lists for each SavingsAccount.
>    This seems to be a possible cause of slowdown as the batch can get slower
>    as the number of transactions increases. This also results in O(N2) 
> complexity.
>    Do you agree with this? Will multithreading help here?
>    2. For the updateSummary() method, it iterates through all the
>    SavingsAccountTransactions of each SavingsAccount to compute the total
>    deposits, total withdrawals etc. It does this for every call to
>    postInterest method. Our question here is why does it need to do this? Is
>    it to make sure that the total values are accurate? Why does it not just
>    add the latest SavingsAccountTransaction to the previously computed total
>    deposits, total withdrawals?
>
> We are very hesitant to do any code change in the business logic, but we
> think that it can be a possible cause of slowdowns. Please correct us if
> you think otherwise.
>
> Also can we get some of the results you  had in your own load testing?
>
> Thank you!
>
> Joseph
>
> On Mon, Nov 11, 2019 at 10:59 AM Avik Ganguly <a...@fynarfin.io> wrote:
>
>> Hey Joseph,
>>
>> Are you using Fineract v1.2 or above? The core jobs are multithreaded in
>> that version and fetch / commit batch sizes can be configured.
>>
>> K2, can you confirm if the JMeter scripts you used to test were merged
>> and available in Fineract v1.2?
>>
>> We are using a 4vCPU 128GB server to do the load testing. Our target for
>>> 1.2M savings accounts is less than 1 hour.
>>
>> We were able to reduce the post interest to savings schedule job run time
>>> from 15-20 hours to 6 hours with these modifications.
>>
>>  Applying the same fix on top of Fineract 1.2 might bring it down to less
>> than 2 hours in that case.
>>
>> @Ed Cable <edca...@mifos.org>
>>
>> and start a collaborative effort to carry out these performance and load
>>> testing exercises on a regular basis and a group of contributors focused on
>>> identifying and fixing the issues to improve performance.
>>
>>
>> Please find a list of items not specific to interest posting job but more
>> about transactions in general    :-
>>
>>
>> Code Change :-
>>
>> 1. OptimisticLockException fix for concurrent transactions affecting the
>> same account.
>> 2. (Suggested by Nayan) Use Spring Retry to restart the txn for
>> LockWaitTimeouts where the LockWaits are unavoidable.
>> 3. Remove reliance from txn_running_balance (it anyways reflects wrong
>> values on doing concurrent txns on same account).
>> 4. Move officially to MySQL 5.7 if it's not official yet. Whether to
>> change queries to support default only full group by or to change the SQL
>> mode.
>> 5. Archival & Partition Strategy for transactions. (I think this should
>> be the highest priority item as it will have a positive cascading effect on
>> almost all core API calls, batch jobs and reports)
>> 6. Selectively change read API redirects to replica DB. Consult DB admin
>> for the right replication and clustering configuration to make this
>> feasible based on usage, replication lag, etc.
>> 7. Use Hystrix for short circuit failing integration touchpoints.
>> 8. Separate the modules at a very high level (different discussion).
>> 9. Multi node EhCache has some issues.
>> 10. Traffic shaping / congestion control - control bandwidth and
>> burstiness at an API level.
>> 11. Ensure most indexes are in read replica store and OLTP db does not
>> have indexes used by reports.
>>
>> Configuration Change / Ship Configuration Profiles :-
>>
>> 1. Different profiles for tenant_server_connections thread pool config
>> 2. my.cnf configuration for optimizing DB performance for already
>> available hardware resources.
>> 3. Tenants DB thread pool size in web Server configuration file
>> 4. Include profiles for clustering - Percona, Aurora, Galera?
>> 5. Include config for reverse proxy - Nginx?
>> 6. Include Tomcat 8 NIO server config file?
>>
>> Infra Setup :-
>>
>> 1. Infra setup - Establish network latency isn't the case - install mysql
>> client CLI on your app server, connect to DB server using mysql client and
>> see how many ms a super simple query takes. Not an infra expert but please
>> find a common misconfiguration example.
>> Ex :-
>> select id from m_code where id = 1;
>> AWS - Same subnet - 0.3 ms
>> AWS - Same VPCs - 6ms
>> AWS - Across VPCs - 300ms
>>
>> Behaviour Change (Customizing Mifos for use cases Mifos wasn't designed
>> for) :-
>>
>> 1. Txn API TPS - Review how Fineract features can be used to solve
>> business use cases. Ex:- Separating one single collection account into
>> different accounts at merchants / offices / virtual offices / regions
>> level. You can tag the collection use case with a payment type as well.
>>
>>
>> Identifying / Reporting Issues :-
>>
>> 1. How to debug / Get relevant logs (bit subjective) - Bit aimed towards
>> correctly reporting the issue (if not including a test case to reproduce
>> the issue) :-
>> 1a. Slow queries in write txn APIs standing out in production server logs.
>> 1b. Local application debug (connect to QA env db if reproduction steps
>> are highly data dependent)
>> 1c. Switch show queries to true to flood the logs with all queries
>> generated by JPA. Notice repetitive (self references (Ex :- Office entity
>> eagerly fetching parent and list of children)  and highly propagating
>> (eager fetches) queries by clearing logs before and after txn.
>> 1d. Profile or put timers at critical lines of DB interaction and
>> subtract with previous timer value to notice bottleneck queries.
>> 1e. Remote debug two concurrent API calls together. Put a breakpoint at
>> the start of the relevant service and bring both threads forward one at a
>> time till one thread reaches lock acquiring query and other thread goes
>> into wait. See what DB operations are remaining after that to get an idea
>> of how long that lock will be held. Minimize the amount of time it's
>> holding on to the lock to minimize the chances of LockWaitTimeout
>> exceptions.
>> 2. Review indexes on core tables. Did someone put an index on a timestamp
>> column? Did that index addition go through a PR or was it executed directly
>> on environment? Tighten control to prevent users directly changing DB (DDL
>> statements at least). Introduce more indexes?
>>
>>
>> Test Suite Development :-
>>
>> 1. Add more JMeter test scripts and integrate it with a CI cum load
>> testing server and catch regressions on code change.
>>
>> Commercial Tools :-
>>
>> 1. APM tool (New Relic, Dynatrace, Nagios, AppDynamics) - easy to do
>> operational review of queries with highest counts (rather than the longest
>> running queries in slow query log) - when DB is choking on a low resource
>> server, all queries will perform much worse than in normal conditions.
>> 2. DB monitoring tool - Percona Monitoring and Management.
>> 3. Nginx Plus or HAProxy
>>
>>
>> With best regards,
>> Avik.
>> ᐧ
>>
>>
>> On Tue, Oct 22, 2019 at 1:38 AM Ed Cable <edca...@mifos.org> wrote:
>>
>>> Nayan,
>>>
>>> Thanks for assisting Joseph with those suggestions and Joseph thanks for
>>> sharing the initial results of improvements after indexing. I'm happy you
>>> emailed about this as performance testing and improving performance for
>>> high scalability environments is a topic of wide importance for many
>>> members of the current ecosystem and a number of prospective new adopters
>>> that are looking at the stack.
>>>
>>> As evident from your searching and the input you have received thus far
>>> on the thread what's publicly out there on performance statistics and
>>> fine-tuning performance is limited. Yet, many of our implementers have run
>>> Mifos/Fineract in high load environments and have a lot of wisdom and
>>> experience to share.
>>>
>>> @Avik Ganguly <a...@fynarfin.io> based on your experiences, are you
>>> able to give some additional suggestions on top of what Nayan has already
>>> provided.
>>>
>>> I will get a separate email thread going to start a collaborative effort
>>> across the community to get in place a set of reproducible tools to do
>>> ongoing performance testing on Fineract and Fineract CN, and start a
>>> collaborative effort to carry out these performance and load testing
>>> exercises on a regular basis and a group of contributors focused on
>>> identifying and fixing the issues to improve performance.
>>>
>>> We welcome your contributions to those efforts
>>>
>>> Ed
>>>
>>>
>>>
>>> On Mon, Oct 21, 2019 at 7:14 AM Joseph Cabral <joseph.cabra...@gmail.com>
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I would like to share the initial results of our testing and get your
>>>> feedback.
>>>>
>>>> We modified the code for the post interest to savings scheduler job,
>>>> specifically the findByStatus method in SavingsRepositoryWrapper where it
>>>> queries the active savings accounts. We noticed that it the current design
>>>> depended on lazy load fetching to populate the transactions and charges
>>>> list of each savings accounts. From previous experience with  other systems
>>>> this has been a cause of various slow downs so we focused on modifying this
>>>> part. We decided to query the savings transaction and charges in bulk to
>>>> reduce the number of database calls. See below for our implementation.
>>>>
>>>> We also removed the CascadeType.ALL and FetchType.Lazy settings for the
>>>> transactions and charges list of the SavingsAccount entity as we are
>>>> already manually fetching their contents. We will do further testing as
>>>> this may have an impact in other modules.
>>>>
>>>> @Transactional(readOnly=true)
>>>>     public Page<SavingsAccount> findByStatus(Integer status, Pageable 
>>>> pageable) {
>>>>         logger.info("findByStatus - Start querying savings account");
>>>>         Page<SavingsAccount> accounts = 
>>>> this.repository.findByStatus(status, pageable);
>>>>         List<Long> idList = new ArrayList<Long>();
>>>>         Map<Long, SavingsAccount> accountsMap = new HashMap<Long, 
>>>> SavingsAccount>();
>>>>         if(accounts != null) {
>>>>             for(SavingsAccount account : accounts) {
>>>>                 account.setCharges(new HashSet<>());
>>>>                 account.setTransactions(new ArrayList<>());
>>>>
>>>>                 idList.add(account.getId());
>>>>                 accountsMap.put(account.getId(), account);
>>>>             }
>>>>             List<SavingsAccountTransaction> savingsAccountTransactionList 
>>>> = savingsAccountTransactionRepository.findBySavingsAccountIdList(idList);
>>>>             if(savingsAccountTransactionList != null) {
>>>>                 for(SavingsAccountTransaction transaction : 
>>>> savingsAccountTransactionList) {
>>>>                     SavingsAccount account = 
>>>> accountsMap.get(transaction.getSavingsAccount().getId());
>>>>                     account.getTransactions().add(transaction);
>>>>                 }
>>>>             }
>>>>
>>>>             Set<SavingsAccountCharge> savingsAccountChargeList = 
>>>> savingsAccountChargeRepository.findBySavingsAccountIdList(idList);
>>>>             if(savingsAccountChargeList != null) {
>>>>                 for(SavingsAccountCharge charges : 
>>>> savingsAccountChargeList) {
>>>>                     SavingsAccount account = 
>>>> accountsMap.get(charges.savingsAccount().getId());
>>>>                     account.getCharges().add(charges);
>>>>                 }
>>>>             }
>>>>         }
>>>>         logger.info("findByStatus - Finished querying savings account");// 
>>>>        loadLazyCollections(accounts);
>>>>         return accounts;
>>>>     }
>>>>
>>>>
>>>> We were able to reduce the post interest to savings schedule job run
>>>> time from 15-20 hours to 6 hours with these modifications. After this we
>>>> will look at how to reduce the run time for the saving/updating part.
>>>>
>>>> I would like to ask if anyone has any alternative solutions or if you
>>>> have any feedback on how we implemented it?
>>>>
>>>> Regards,
>>>>
>>>> Joseph
>>>>
>>>> On Sun, Oct 20, 2019 at 11:02 PM Joseph Cabral <
>>>> joseph.cabra...@gmail.com> wrote:
>>>>
>>>>> Hi Michael,
>>>>>
>>>>> Of course, we will give feedback if we are able to make improvements
>>>>> to our scheduler job run time.
>>>>>
>>>>> But if anyone else has any experience in load testing or running
>>>>> Fineract in high load environments I am open to suggestions.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Joseph
>>>>>
>>>>> On Sun, Oct 20, 2019 at 7:32 PM Michael Vorburger <m...@vorburger.ch>
>>>>> wrote:
>>>>>
>>>>>> Joseph,
>>>>>>
>>>>>> On Sun, 20 Oct 2019, 00:28 Joseph Cabral, <joseph.cabra...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Nayan,
>>>>>>>
>>>>>>> Thank you for the tips! I tried asking for advice here first because
>>>>>>> we are wary of doing any code change since we were under the assumption
>>>>>>> that Fineract had already been used in many high load situations in
>>>>>>> production and we have just set it up wrong or there are some settings 
>>>>>>> we
>>>>>>> could change.
>>>>>>>
>>>>>>> We will first try adding some indexes to the database but it looks
>>>>>>> like we have to do some code change for this.
>>>>>>>
>>>>>>
>>>>>> Will you be contributing any performance related improvements you
>>>>>> make back to the community?
>>>>>>
>>>>>> Thanks again!
>>>>>>>
>>>>>>> Joseph
>>>>>>>
>>>>>>> On Sun, Oct 20, 2019 at 2:21 AM Nayan Ambali <nayan.amb...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Joseph,
>>>>>>>>
>>>>>>>> Previously I had done Mifos platform load testing for the
>>>>>>>> community, based on my experience below are my recommendations
>>>>>>>>
>>>>>>>> *without code change*
>>>>>>>> 1. Use SSD/High IOPS storage for Database
>>>>>>>> 2. If you are on AWS, go for Aurora instead of MySQL
>>>>>>>> 3. Look at database usage for this batch job and see if there is
>>>>>>>> opportunity index some columns for better performance
>>>>>>>>
>>>>>>>> *with code change*
>>>>>>>> 1. Process the data parallel either on with multi threading on
>>>>>>>> single node or we can go for multiple nodes
>>>>>>>> 2. Query optimisation
>>>>>>>> 3. Data fetch and commit batch size
>>>>>>>> there are many other opportunities to improve the same
>>>>>>>>
>>>>>>>> -
>>>>>>>> at your service
>>>>>>>>
>>>>>>>> Nayan Ambali
>>>>>>>> +91 9591996042
>>>>>>>> skype: nayangambali
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Oct 19, 2019 at 2:20 PM Joseph Cabral <
>>>>>>>> joseph.cabra...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Everyone,
>>>>>>>>>
>>>>>>>>> I would like to ask if anyone has done load testing of the
>>>>>>>>> MifosX/Fineract scheduler jobs? Specifically the post interest to 
>>>>>>>>> savings
>>>>>>>>> job.
>>>>>>>>>
>>>>>>>>> We created a new Savings Product (ADB Monthly - 10% Interest) with
>>>>>>>>> the following settings:
>>>>>>>>> Nominal interest rate (annual): 10%
>>>>>>>>> Balance required for interest calculation: $1,000.00
>>>>>>>>> Interest compounding period: Monthly
>>>>>>>>> Interest posting period: Monthly
>>>>>>>>> Interested calculated using: Average Daily Balance
>>>>>>>>>
>>>>>>>>> We populated the m_savings_account table using the savings product
>>>>>>>>> above with 1.2M new savings accounts which we then deposited an 
>>>>>>>>> initial
>>>>>>>>> balance of $10,000 each. We then edited the post interest to savings 
>>>>>>>>> job to
>>>>>>>>> post the interest even though it is not yet the end of the month.
>>>>>>>>>
>>>>>>>>> On consecutive tests, we averaged around 15 to 20 hours to
>>>>>>>>> complete the job.
>>>>>>>>>
>>>>>>>>> We are using a 4vCPU 128GB server to do the load testing. We
>>>>>>>>> deployed MifosX/Fineract in a docker container with Tomcat 7. The 
>>>>>>>>> MySQL
>>>>>>>>> database is deployed in a separate docker container on the same 
>>>>>>>>> machine.
>>>>>>>>>
>>>>>>>>> Any tips or ideas on what we can do to improve the run time of the
>>>>>>>>> job? Our target for 1.2M savings accounts is less than 1 hour.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Joseph
>>>>>>>>>
>>>>>>>>
>>>
>>> --
>>> *Ed Cable*
>>> President/CEO, Mifos Initiative
>>> edca...@mifos.org | Skype: edcable | Mobile: +1.484.477.8649
>>>
>>> *Collectively Creating a World of 3 Billion Maries | *http://mifos.org
>>> <http://facebook.com/mifos>  <http://www.twitter.com/mifos>
>>>
>>>

_______________________________________________
Mifos-users mailing list
Mifos-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mifos-users

Re: [Mifos-users] Load testing of MifosX/Fineract scheduler jobs

Reply via email to