[jira] [Comment Edited] (CASSANDRA-16951) Dtest cluster reusage

Berenguer Blasi (Jira) Wed, 15 Sep 2021 01:30:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414819#comment-17414819
 ]


Berenguer Blasi edited comment on CASSANDRA-16951 at 9/15/21, 8:29 AM:
-----------------------------------------------------------------------

*How to test locally std*

{{pytest --log-cli-level=DEBUG --cassandra-dir=../nodeReusage 
paging_test.py::TestPagingDatasetChanges}}
Notice how 3 nodes are started and the next tests reuse them. Debug traces help 
you track the behaviour. Now comment out the {{reuse_cluster}} annotations and 
notice how a new cluster is started every time.

You have to also test it passing in a list of tests in a file. You can use this 
one or any other one
{noformat}
user_functions_test.py::TestUserFunctions::test_default_aggregate
user_functions_test.py::TestUserFunctions::test_aggregate_udf
user_functions_test.py::TestUserFunctions::test_udf_with_udt
user_functions_test.py::TestUserFunctions::test_udf_with_udt_keyspace_isolation
user_functions_test.py::TestUserFunctions::test_aggregate_with_udt_keyspace_isolation
auditlog_test.py::TestAuditlog::test_archiving
{noformat}

{{pytest --log-cli-level=DEBUG --cassandra-dir=../nodeReusage `cat 
/tmp/splits/test_list.txt`}} You should notice reusage of nodes except for the 
last test i.e.


*How to test locally with jenkins docker script*
{{sudo ./cassandra-builds/build-scripts/cassandra-dtest-pytest-docker.sh apache 
cassandra-4.0 https://github.com/bereng/cassandra-dtest.git nodeReusage 
https://github.com/bereng/cassandra-builds.git nodeReusage 
apache/cassandra-testing-ubuntu2004-java11 dtest 1/64}}

Use the debug output to see how nodes are being reused when applicable.
Use different splits to see how the splitter is working


*How to test on Circle CI*
Example runs: 
[j8|https://app.circleci.com/pipelines/github/bereng/cassandra/420/workflows/27373857-3978-4db1-9cca-778f546db23f]
 & 
[j11|https://app.circleci.com/pipelines/github/bereng/cassandra/420/workflows/d70dc386-7831-4531-8eb8-1868d6e6c532]

- Repeat feature: Works and we can see it reuses cluster in the 
[log|https://app.circleci.com/pipelines/github/bereng/cassandra/420/workflows/27373857-3978-4db1-9cca-778f546db23f/jobs/3807/parallel-runs/0/steps/0-104]
- Upgrade tests work
- Dtests work and we can see nodes are being reused in the 
[logs|https://app.circleci.com/pipelines/github/bereng/cassandra/420/workflows/27373857-3978-4db1-9cca-778f546db23f/jobs/3802/parallel-runs/2/steps/2-105].
 Also for this particular node it only takes 1m making it very obvious
- Splitter can be seen working correctly as each circle node is running 
different tests
- Seems there is a small but to be committed in j11 when cleaning keyspaces.

*How to test on jenkins CI*
Jenkins job [here|https://ci-cassandra.apache.org/job/CASSANDRA-16951-dtest/4/]
It's difficult to asses the benefits as results are split per node. But MV 
tests went from 1h to 45m. By looking at the output of the first splits we can 
see how node reusage is being effective.

Run time seems to have gone from 1h and a bit to 30m for the same-ish number of 
tests (diff=8 tests, pending rebase). But that is a lot which might also be 
down to jenkins being fresh after a restart. Still the individual test classes 
times are down whe using node reusage.

*Conclusion*
Overall test workers where node reusage is happening really speed up. That 
should free workers sooner to run other workloads. The run time of a specific 
test run mihg tnot be cut down atm bc of long poles, but now we have a tool to 
try to cut those shorter.


was (Author: bereng):
*How to test locally std*

{{pytest --log-cli-level=DEBUG --cassandra-dir=../nodeReusage 
paging_test.py::TestPagingDatasetChanges}}
Notice how 3 nodes are started and the next tests reuse them. Debug traces help 
you track the behaviour. Now comment out the {{reuse_cluster}} annotations and 
notice how a new cluster is started every time.

You have to also test it passing in a list of tests in a file. You can use this 
one or any other one
{noformat}
user_functions_test.py::TestUserFunctions::test_default_aggregate
user_functions_test.py::TestUserFunctions::test_aggregate_udf
user_functions_test.py::TestUserFunctions::test_udf_with_udt
user_functions_test.py::TestUserFunctions::test_udf_with_udt_keyspace_isolation
user_functions_test.py::TestUserFunctions::test_aggregate_with_udt_keyspace_isolation
auditlog_test.py::TestAuditlog::test_archiving
{noformat}

{{pytest --log-cli-level=DEBUG --cassandra-dir=../nodeReusage `cat 
/tmp/splits/test_list.txt`}} You should notice reusage of nodes except for the 
last test i.e.


*How to test locally with jenkins docker script*
{{sudo ./cassandra-builds/build-scripts/cassandra-dtest-pytest-docker.sh apache 
cassandra-4.0 https://github.com/bereng/cassandra-dtest.git nodeReusage 
https://github.com/bereng/cassandra-builds.git nodeReusage 
apache/cassandra-testing-ubuntu2004-java11 dtest 1/64}}

Use the debug output to see how nodes are being reused when applicable.
Use different splits to see how the splitter is working


*How to test on Circle CI*
Example runs: 
[j8|https://app.circleci.com/pipelines/github/bereng/cassandra/420/workflows/27373857-3978-4db1-9cca-778f546db23f]
 & 
[j11|https://app.circleci.com/pipelines/github/bereng/cassandra/420/workflows/d70dc386-7831-4531-8eb8-1868d6e6c532]

- Repeat feature: Works and we can see it reuses cluster in the 
[log|https://app.circleci.com/pipelines/github/bereng/cassandra/420/workflows/27373857-3978-4db1-9cca-778f546db23f/jobs/3807/parallel-runs/0/steps/0-104]
- Upgrade tests work
- Dtests work and we can see nodes are being reused in the 
[logs|https://app.circleci.com/pipelines/github/bereng/cassandra/420/workflows/27373857-3978-4db1-9cca-778f546db23f/jobs/3802/parallel-runs/2/steps/2-105].
 Also for this particular node it only takes 1m making it very obvious
- Splitter can be seen working correctly as each circle node is running 
different tests
- Seems there is a small but to be committed in j11 when cleaning keyspaces.

*How to test on jenkins CI*
Jenkins job [here|https://ci-cassandra.apache.org/job/CASSANDRA-16951-dtest/4/]
It's difficult to asses the benefits as results are split per node. But MV 
tests went from 1h to 45m. By looking at the output of the first splits we can 
see how node reusage is being effective.

Run time seems to have gone from 1h and a bit to 30m. But that is a lot which 
might also be down to jenkins being fresh after a restart. Still the individual 
test classes times are down whe using node reusage.

*Conclusion*
Overall test workers where node reusage is happening really speed up. That 
should free workers sooner to run other workloads. The run time of a specific 
test run mihg tnot be cut down atm bc of long poles, but now we have a tool to 
try to cut those shorter.

> Dtest cluster reusage
> ---------------------
>
>                 Key: CASSANDRA-16951
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16951
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Test/dtest/python
>            Reporter: Berenguer Blasi
>            Assignee: Berenguer Blasi
>            Priority: Normal
>             Fix For: 4.0.x
>
>
> Dtests are very heavy but in some instances most of the time is spent 
> restarting nodes in between test methods. Not all of them, but many seem 
> could benefit form reusing a common cluster sparing the restarts. Obviously 
> that is not the case for tests that manipulate the nodes itself during the 
> test. The ones that follow a setup node/do test seem to benefit greatly in 
> terms of time execution.
> Some classes run time can be cut form 10m to 1,5m. Others only from 30m to 
> 25m. But taking a 5m shave and considering it will probably get ran * 
> with/without vnodes * j8/j11/j8j11 * 4.0/trunk turns the 5m cut into a 60m 
> cut. That should be a nice reduction in CI usage. Unfortunately run time will 
> mostly remain the same until we have a majority of tests reusing nodes as the 
> 'longest pole' will be the determining factor.
> How it works? It's an opt-in. Annotate the first test with 
> {{@reuse_cluster(new_cluster=True)}} and the following ones with 
> {{@reuse_cluster}}. Best effort to reuse the cluster will be made. Stop using 
> the annotation at any test method and it will start a new one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-16951) Dtest cluster reusage

Reply via email to