vishesh92 opened a new pull request, #8903:
URL: https://github.com/apache/cloudstack/pull/8903
### Description
This PR fixes the issues which occur when increment/decrement methods are
waiting for a lock on domain tables and `ResourceCountCheckTask` is running at
the same time. This issue appears when innodb_lock_wait_timeout is many times
less than the time it takes for `recalculateDomainResourceCount` to complete.
(Check steps below on how to reproduce the error).
```java
com.cloud.utils.exception.CloudRuntimeException: DB Exception on:
com.mysql.cj.jdbc.ClientPreparedStatement: SELECT resource_count.id,
resource_count.type, resource_count.account_i
d, resource_count.domain_id, resource_count.count, resource_count.tag FROM
resource_count WHERE resource_count.id IN (33,4785,3513,4845) FOR UPDATE
at
com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:438)
at
com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:366)
at com.cloud.utils.db.GenericDaoBase.search(GenericDaoBase.java:355)
at
com.cloud.utils.db.GenericDaoBase.lockRows(GenericDaoBase.java:341)
.....
```
We do this by removing unnecessary locks and simplifying count updates.
As of now, to calculate the resource count for root domain, we are taking
the lock on the entire table.
This PR also splits the domain count calculation transaction into multiple
transactions locks. This is done by breaking up the domain count calculation
process by:
1. Calculate resource count for all accounts in a domain
2. Calculate resource count for all child domains in a domain
3. In a transaction, fetch the child domain & accounts count and update the
count if required
<!--- Describe your changes in DETAIL - And how has behaviour functionally
changed. -->
<!-- For new features, provide link to FS, dev ML discussion etc. -->
<!-- In case of bug fix, the expected and actual behaviours, steps to
reproduce. -->
<!-- When "Fixes: #<id>" is specified, the issue/PR will automatically be
closed when this PR gets merged -->
<!-- For addressing multiple issues/PRs, use multiple "Fixes: #<id>" -->
<!-- Fixes: # -->
<!---
*********************************************************************************
-->
<!--- NOTE: AUTOMATATION USES THE DESCRIPTIONS TO SET LABELS AND PRODUCE
DOCUMENTATION. -->
<!--- PLEASE PUT AN 'X' in only **ONE** box -->
<!---
*********************************************************************************
-->
### Types of changes
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] Enhancement (improves an existing feature and functionality)
- [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
### Feature/Enhancement Scale or Bug Severity
#### Feature/Enhancement Scale
- [ ] Major
- [ ] Minor
#### Bug Severity
- [ ] BLOCKER
- [ ] Critical
- [ ] Major
- [ ] Minor
- [ ] Trivial
### Screenshots (if appropriate):
### How Has This Been Tested?
<!-- Please describe in detail how you tested your changes. -->
<!-- Include details of your testing environment, and the tests you ran to
-->
<!-- see how your change affects other areas of the code, etc. -->
1. Setup multiple domains & networks. And update their limits. I used the
below command.
```bash
csbench -create -domain -network -limits
```
```
# csbench-config
numdomains = 10
numnetworks = 1
numvms = 100
startvm = false # For faster creation of VMs
```
2. Check the time it takes for resource count calculation to run. To
manually trigger resource count calculation, run this command:
```bash
time cmk update resourcecount domainid=1
```
3. Update `innodb_lock_wait_timeout` to a value less than by a few seconds
it took for the above request to complete.
```sql
SET GLOBAL innodb_lock_wait_timeout=3;
```
4. Restart the management server for `innodb_lock_wait_timeout` change to
take effect.
5. Run the below commands.
```
csbench -create -vm -workers=50
csbench -teardown -vm -workers=50
```
In parallel to above requests, execute `cmk update resourcecount domainid=1`
to trigger resource count recalculation while VMs are getting created or
destroyed.
6. Check logs for `ClientPreparedStatement`.
```bash
grep "ClientPreparedStatement" vmops.log
```
#### Results
##### With patch - creation of VM in stopped state
```
+----------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+
| TYPE | COUNT | MIN | MAX | AVG | MEDIAN | 90TH PERCENTILE |
95TH PERCENTILE | 99TH PERCENTILE |
+----------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+
| vm - All | 1000 | 1.708 | 12.123 | 3.874 | 3.46 | 5.428 |
6.662 | 8.614 |
+----------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+
```
```
+------------------+-------+--------+-------+--------+--------+-----------------+-----------------+-----------------+
| TYPE | COUNT | MIN | MAX | AVG | MEDIAN | 90TH
PERCENTILE | 95TH PERCENTILE | 99TH PERCENTILE |
+------------------+-------+--------+-------+--------+--------+-----------------+-----------------+-----------------+
| vm-destroy - All | 1000 | 10.286 | 21.86 | 17.987 | 15.467 |
21.518 | 21.589 | 21.779 |
+------------------+-------+--------+-------+--------+--------+-----------------+-----------------+-----------------+
```
##### Without patch
```
+-----------------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+
| TYPE | COUNT | MIN | MAX | AVG | MEDIAN | 90TH
PERCENTILE | 95TH PERCENTILE | 99TH PERCENTILE |
+-----------------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+
| vm - All | 1000 | 2.039 | 17.463 | 5.656 | 4.77 |
10.484 | 11.758 | 13.645 |
| vm - Successful | 988 | 2.039 | 17.463 | 5.67 | 4.773 |
10.489 | 11.791 | 13.753 |
| vm - Failed | 12 | 3.181 | 5.414 | 4.493 | 4.679 |
5.21 | 5.313 | 5.313 |
+-----------------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+
```
```
+------------------+-------+--------+--------+--------+--------+-----------------+-----------------+-----------------+
| TYPE | COUNT | MIN | MAX | AVG | MEDIAN | 90TH
PERCENTILE | 95TH PERCENTILE | 99TH PERCENTILE |
+------------------+-------+--------+--------+--------+--------+-----------------+-----------------+-----------------+
| vm-destroy - All | 996 | 10.295 | 29.176 | 20.111 | 21.417 |
21.655 | 22.27 | 28.691 |
+------------------+-------+--------+--------+--------+--------+-----------------+-----------------+-----------------+
```
<!-- Please read the
[CONTRIBUTING](https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md)
document -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]