[ 
https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083462#comment-17083462
 ] 

David Capwell commented on CASSANDRA-15686:
-------------------------------------------

[~mck] your link is jvm dtest 
(org.apache.cassandra.distributed.test.BootstrapTest.bootstrapTest) which would 
require more work to support concurrent runners (doable).  That test looks to 
start a cluster, tear it down, then start another one; my guess is the second 
one fails since the first isn't fully dead yet?  This looks like a bug to me.

bq. This indicates that the unit tests were runner-safe at some point in the 
past, and have since been broken…

https://issues.apache.org/jira/browse/CASSANDRA-13078?focusedCommentId=16039394&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16039394

bq. any thoughts on how this would interact with (for example) circleci or asf 
jenkins? -- Jeff
bq. It's fine we would just override it for CircleCI so it's 1. For ASF Jenkins 
it really depends on how big those boxes, but I am pretty sure we can pick a 
number > 1. A quad-core box can handle 4 just fine. -- Ariel

Looks like Circle CI has always been 1 runner.  They talk about updating 
Jenkins but that didn't look to happen?  If CI never ran with runners > 1 and 
it was only locally a few times, then I wouldn't say that this is a regression 
as failures are not consistent so plausible to not be seen the few times this 
was used?

bq. It would be nice to return this functionality (automatic runner 
calculation) to build.xml in trunk.

If builds get faster, then +1 from me.

bq. Are the breakages just around a few code changes that has been introduced 
since the get-cores and get-mem ant targets were added? for example running 
multiple nodes on the one ipaddress but with different ports?

org.apache.cassandra.net.ConnectionTest#doTestManual doesn't use random ports 
so would conflict with anything trying to use the default port (not sure what 
does that) but doesn't look bad to change that for that test, but we would also 
need to know which test also did that.  

Are there other tests?  Not sure, but I wouldn't be opposed to changing to 
concurrent runners and mark the failures as blockers for alpha (like we do 
flaky tests)

> Improvements in circle CI default config
> ----------------------------------------
>
>                 Key: CASSANDRA-15686
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15686
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Build
>            Reporter: Kevin Gallardo
>            Priority: Normal
>
> I have been looking at and played around with the [default CircleCI 
> config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], 
> a few comments/questions regarding the following topics:
>  * Python dtests do not run successfully (200-300 failures) on {{medium}} 
> instances, they seem to only run with small flaky failures on {{large}} 
> instances or higher
>  * Python Upgrade tests:
>  ** Do not seem to run without many failures on any instance types / any 
> parallelism setting
>  ** Do not seem to parallelize well, it seems each container is going to 
> download multiple C* versions
>  ** Additionally it seems the configuration is not up to date, as currently 
> we get errors because {{JAVA8_HOME}} is not set
>  * Unit tests do not seem to parallelize optimally, number of test runners do 
> not reflect the available CPUs on the container. Ideally if # of runners == # 
> of CPUs, build time is improved, on any type of instances.
>  ** For instance when using the current configuration, running on medium 
> instances, build will use 1 junit test runner, but 2 CPUs are available. If 
> using 2 runners, the build time is reduced from 19min (at the current main 
> config of parallelism=4) to 12min.
>  * There are some typos in the file, some dtests say "Run Unit Tests" but 
> they are JVM dtests (see 
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077],
>  
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386])
> So some ways to process these would be:
>  * Do the Python dtests run successfully for anyone on {{medium}} instances? 
> If not, would it make sense to bump them to {{large}} so that they can be run 
> successfully?
>  * Does anybody ever run the python upgrade tests on CircleCI and what is the 
> configuration that makes it work?
>  * Would it make sense to either hardcode the number of test runners in the 
> unit tests with `-Dtest.runners` in the config file to reflect the number of 
> CPUs on the instances, or change the build so that it is able to detect the 
> appropriate number of core available automatically?
> Additionally, it seems this default config file (config.yml) is not as well 
> maintained as the 
> [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml]
>  (+its lowres/highres) version in the same folder (from CASSANDRA-14806). 
> What is the reasoning for maintaining these 2 versions of the build? Could 
> the better maintained version be used as the default? We could generate a 
> lowres version of the new config-2_1.yml, and rename it {{config.yml}} so 
> that it gets picked up by CircleCI automatically instead of the current 
> default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to