[jira] [Comment Edited] (CASSANDRA-17043) CircleCI dtest multiplexer with MIDRES needs more resources

Jira Mon, 18 Oct 2021 08:05:06 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430051#comment-17430051
 ]


Andres de la Peña edited comment on CASSANDRA-17043 at 10/18/21, 3:04 PM:
--------------------------------------------------------------------------

{quote}In that sense it makes sense to me to raise the parallelism for the 
MIDRES multiplexer for the upgrade in-jvm tests for example. Why ? Because the 
whole suite is 10-20 test classes, but the default run in a loop which I would 
assume most people will run is 100. So if we generalize this means to me 10 
times more time for execution if we think that most tests need similar amount 
of time to run(do they?).
{quote}
I think that the amount of time to run tests varies wildly even for the same 
type of tests. Not only we have very different tests, but we can also multiplex 
either a method or its containing suite. If the suite has ten methods, that run 
will take ten times more. So I think it's difficult to adjust parallelism for 
an expected run time if we don't know what test is going to be repeated, 
differently to the relative fixed sets of tests that we have for the standard 
jobs. I guess that the choice of parallelism for the multiplexer is going to be 
based more on how much resources we want to invest than on the running time 
that we aim to achieve.

The current parallelisms of 4/25/100 for low/mid/high resources save some of 
the overhead of starting new runners in low and mid configs, although I don't 
know how noticeable it would be in practice. Another reason not to use a very 
high parallelism is that I understand that the maximum number of concurrent 
runners is limited per organization, so a job with a very high parallelism can 
produce starvation in other users. I understand that when users with access to 
high resources choose to run with low/mid config they are not in a big hurry 
and they can wait a bit longer in order not to exhaust the resources for other 
users. Having a very high parallelism in these configs could make this type of 
not-invasive runs more difficult to do.

That said, I have no clue about what parallelism would be better for the 
average case, although the current values have been good to me by now. I'd be 
happy to increase the parallelism of {{repeated_dtest}} if we find it too low, 
what value for what config would you suggest?

Here are the updated patches with some multiplexer runs:
||Branch||CI low||CI mid||CI high||
|[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...adelapena:17043-3.0]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1057/workflows/cd90268d-8af7-4d8c-9522-a277398acdc4]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1051/workflows/2550a673-5fc2-4748-b1c6-1ce56b5b2002]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1048/workflows/9d8a08dd-0db3-4a9e-911f-eb64a87741ff]|
|[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...adelapena:17043-3.11]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1055/workflows/2412eeca-2a58-424c-9986-d6fb71cd13d4]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1054/workflows/fa78a6e8-be03-4ea8-9169-f92b03cc2a54]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1050/workflows/a7f29c05-ea57-40dd-84dd-e695514ca392]|
|[4.0|https://github.com/apache/cassandra/compare/cassandra-4.0...adelapena:17043-4.0]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1056/workflows/08393fb6-17db-4d91-b3ed-5d7800c56dc1]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1056/workflows/23eeb750-e163-406c-9430-aa37e67b0717]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1053/workflows/15d171cd-cc45-4718-ac83-90710c725097]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1053/workflows/d5c4ef81-04d0-4081-8a9c-f5f1753debd8]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1049/workflows/3047dfae-3b8f-4716-8c18-db4a2fcc8fd5]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1049/workflows/7de5327d-360c-407d-8ab6-0b80fb58ddaf]|
|[trunk|https://github.com/apache/cassandra/compare/trunk...adelapena:17043-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1060/workflows/34a75acf-e786-4b6f-a4c6-9d878e7f1351]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1060/workflows/51beffb0-2ccc-4ce0-b7e4-0cabac0ac223]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1061/workflows/4842863d-b324-4e0f-a95f-a14ec1fb66d4]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1061/workflows/09bef172-f992-4ab5-b945-97966246a927]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1062/workflows/eff4b3c0-1a90-440a-a432-4ebabe716ff4]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1062/workflows/8a5d32ba-eef9-4008-9d03-3f9cd59115f5]|

Of course the running times of the multiplexer jobs above would have been very 
different if I had chosen different tests to be run repeatedly.


was (Author: adelapena):
{quote}In that sense it makes sense to me to raise the parallelism for the 
MIDRES multiplexer for the upgrade in-jvm tests for example. Why ? Because the 
whole suite is 10-20 test classes, but the default run in a loop which I would 
assume most people will run is 100. So if we generalize this means to me 10 
times more time for execution if we think that most tests need similar amount 
of time to run(do they?).
{quote}
I think that the amount of time to run tests varies wildly even for the same 
type of tests. Not only we have very different tests, but we can also multiplex 
either a method or its containing suite. If the suite has ten methods, that run 
will take ten times more time. So I think it's difficult to adjust parallelism 
for an expected run time if we don't know what test is going to be repeated, 
differently to the relative fixed sets of tests that we have for the standard 
jobs. I guess that the choice of parallelism for the multiplexer is going to be 
based more in how much resources we want to invest than in the running time 
that we aim to achieve.

The current parallelisms of 4/25/100 for low/mid/high resources save some of 
the overhead of starting new runners in low and mid configs, although I don't 
know how noticeable would it be in practice. Another reason to not use a very 
high parallelism is that I understand that the maximum number of concurrent 
runners is limited per organization, so a job with a very high parallelism can 
produce starvation in other users. I understand that when users with access to 
high resources choose to run with low/mid config they are not in a big hurry 
and they can wait a bit longer in order to don't exhaust the resources for 
other users. Having a very high parallelism in these configs could make this 
type of not-invasive runs more difficult to do.

That said, I have no clue about what parallelism would be better for the 
average case, although the current values have been good to me by now. I'd be 
happy to increase the parallelism of {{repeated_dtest}} if we find it too low, 
what value for what config would you suggest?

Here are the updated patches with some multiplexer runs:
||Branch||CI low||CI mid||CI high||
|[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...adelapena:17043-3.0]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1057/workflows/cd90268d-8af7-4d8c-9522-a277398acdc4]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1051/workflows/2550a673-5fc2-4748-b1c6-1ce56b5b2002]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1048/workflows/9d8a08dd-0db3-4a9e-911f-eb64a87741ff]|
|[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...adelapena:17043-3.11]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1055/workflows/2412eeca-2a58-424c-9986-d6fb71cd13d4]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1054/workflows/fa78a6e8-be03-4ea8-9169-f92b03cc2a54]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1050/workflows/a7f29c05-ea57-40dd-84dd-e695514ca392]|
|[4.0|https://github.com/apache/cassandra/compare/cassandra-4.0...adelapena:17043-4.0]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1056/workflows/08393fb6-17db-4d91-b3ed-5d7800c56dc1]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1056/workflows/23eeb750-e163-406c-9430-aa37e67b0717]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1053/workflows/15d171cd-cc45-4718-ac83-90710c725097]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1053/workflows/d5c4ef81-04d0-4081-8a9c-f5f1753debd8]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1049/workflows/3047dfae-3b8f-4716-8c18-db4a2fcc8fd5]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1049/workflows/7de5327d-360c-407d-8ab6-0b80fb58ddaf]|
|[trunk|https://github.com/apache/cassandra/compare/trunk...adelapena:17043-trunk]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1060/workflows/34a75acf-e786-4b6f-a4c6-9d878e7f1351]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1060/workflows/51beffb0-2ccc-4ce0-b7e4-0cabac0ac223]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1061/workflows/4842863d-b324-4e0f-a95f-a14ec1fb66d4]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1061/workflows/09bef172-f992-4ab5-b945-97966246a927]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1062/workflows/eff4b3c0-1a90-440a-a432-4ebabe716ff4]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1062/workflows/8a5d32ba-eef9-4008-9d03-3f9cd59115f5]|

Of course the running times of the multiplexer jobs above would have been very 
different if I had chosen different tests to be run repeatedly.

> CircleCI dtest multiplexer with MIDRES needs more resources
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-17043
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17043
>             Project: Cassandra
>          Issue Type: Task
>          Components: CI
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Low
>             Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x
>
>
> The CircleCI jobs for regular dtests jobs have more resources in MIDRES, 
> which is necessary for some dtests to reliably success. However, the dtest 
> multiplexer uses the same resources for LOWRES and MIDRES.
> I think that the dtest multiplexer should always use the same resources as 
> the regular dtests. Using too small resources in the multiplexer can lead to 
> failures that don't reproduce in the regular dtest jobs, like the one we 
> found in 
> [CASSANDRA-16334|https://app.circleci.com/pipelines/github/adelapena/cassandra/1020/workflows/63908694-e4d7-40b1-9418-7a4b87826233/jobs/9422]
>  when trying to repeatedly run a resource-hungry dtest, or like [this other 
> one|https://app.circleci.com/pipelines/github/adelapena/cassandra/1020/workflows/63908694-e4d7-40b1-9418-7a4b87826233/jobs/9422]
>  while running {{test_network_topology}}.
> This happens because I forgot to update the diff patch when adding the 
> multiplexer. This doesn't affect HIGHRES because in that case the patch 
> changes the configuration of the test executor, while in MIDRES a new 
> executor is defined.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-17043) CircleCI dtest multiplexer with MIDRES needs more resources

Reply via email to