[ 
https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074125#comment-17074125
 ] 

David Capwell commented on CASSANDRA-15686:
-------------------------------------------

bq. Python dtests do not run successfully (200-300 failures) on medium 
instances, they seem to only run with small flaky failures on large instances 
or  higher

This is correct and should be improved.

bq. Unit tests do not seem to parallelize optimally, number of test runners do 
not reflect the available CPUs on the container. Ideally if # of runners == # 
of bq. CPUs, build time is improved, on any type of instances.
bq. For instance when using the current configuration, running on medium 
instances, build will use 1 junit test runner, but 2 CPUs are available. If 
using 2 bq. runners, the build time is reduced from 19min (at the current main 
config of parallelism=4) to 12min.

I am only speculating but I thought there were tests that spin up network and 
there are many tests which share disk, and I don't know if we isolate the paths 
or not.  If this is not true then it should be a simple change to bump the 
number of runners for unit tests (definitely not jvm dtests).

bq. There are some typos in the file, some dtests say "Run Unit Tests" but they 
are JVM dtests (see here, here)

Glad to review a patch to fix it =)

bq. Do the Python dtests run successfully for anyone on medium instances? If 
not, would it make sense to bump them to large so that they can be run 
successfully?

I was under the impression the lager instances were not in the free tier; I 
know xlarge is not, never tried large.  My only concern would be the impact to 
the free tier of circle; if no impact then +1.

bq. Additionally, it seems this default config file (config.yml) is not as well 
maintained as the config-2_1.yml (+its lowres/highres) version in the same 
folder (from CASSANDRA-14806). What is the reasoning for maintaining these 2 
versions of the build? Could the better maintained version be used as the 
default? We could generate a lowres version of the new config-2_1.yml, and 
rename it config.yml so that it gets picked up by CircleCI automatically 
instead of the current default.

Sorry I don't follow.

{code}
$ git pull --rebase upstream trunk
>From github.com:apache/cassandra
 * branch                  trunk      -> FETCH_HEAD
Already up to date.
Current branch trunk is up to date.
$ cd .circleci/
$ diff config.yml config.yml.LOWRES
$
{code}

config.yml == config.yml.LOWERS.  We modify config-2_1.yml, call generate.sh, 
then copy config.yml.LOWERS to config.yml.  So the only file we should be 
maintaining is config-2_1.yml.

> Improvements in circle CI default config
> ----------------------------------------
>
>                 Key: CASSANDRA-15686
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15686
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Build
>            Reporter: Kevin Gallardo
>            Priority: Normal
>
> I have been looking at and played around with the [default CircleCI 
> config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], 
> a few comments/questions regarding the following topics:
>  * Python dtests do not run successfully (200-300 failures) on {{medium}} 
> instances, they seem to only run with small flaky failures on {{large}} 
> instances or higher
>  * Python Upgrade tests:
>  ** Do not seem to run without many failures on any instance types / any 
> parallelism setting
>  ** Do not seem to parallelize well, it seems each container is going to 
> download multiple C* versions
>  ** Additionally it seems the configuration is not up to date, as currently 
> we get errors because {{JAVA8_HOME}} is not set
>  * Unit tests do not seem to parallelize optimally, number of test runners do 
> not reflect the available CPUs on the container. Ideally if # of runners == # 
> of CPUs, build time is improved, on any type of instances.
>  ** For instance when using the current configuration, running on medium 
> instances, build will use 1 junit test runner, but 2 CPUs are available. If 
> using 2 runners, the build time is reduced from 19min (at the current main 
> config of parallelism=4) to 12min.
>  * There are some typos in the file, some dtests say "Run Unit Tests" but 
> they are JVM dtests (see 
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077],
>  
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386])
> So some ways to process these would be:
>  * Do the Python dtests run successfully for anyone on {{medium}} instances? 
> If not, would it make sense to bump them to {{large}} so that they can be run 
> successfully?
>  * Does anybody ever run the python upgrade tests on CircleCI and what is the 
> configuration that makes it work?
>  * Would it make sense to either hardcode the number of test runners in the 
> unit tests with `-Dtest.runners` in the config file to reflect the number of 
> CPUs on the instances, or change the build so that it is able to detect the 
> appropriate number of core available automatically?
> Additionally, it seems this default config file (config.yml) is not as well 
> maintained as the 
> [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml]
>  (+its lowres/highres) version in the same folder (from CASSANDRA-14806). 
> What is the reasoning for maintaining these 2 versions of the build? Could 
> the better maintained version be used as the default? We could generate a 
> lowres version of the new config-2_1.yml, and rename it {{config.yml}} so 
> that it gets picked up by CircleCI automatically instead of the current 
> default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to