[jira] [Commented] (CASSANDRA-18009) Tune parallelism for circleci jobs

Berenguer Blasi (Jira) Tue, 08 Nov 2022 01:23:14 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630301#comment-17630301
 ]


Berenguer Blasi commented on CASSANDRA-18009:
---------------------------------------------

I may be holding the wrong end of the stick here or lack context but circleci 
splitter is already configured to take into account test duration 
https://circleci.com/docs/parallelism-faster-jobs/ #justfyi so you only have to 
worry about how many workers you want.

> Tune parallelism for circleci jobs
> ----------------------------------
>
>                 Key: CASSANDRA-18009
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18009
>             Project: Cassandra
>          Issue Type: Task
>          Components: CI
>            Reporter: Josh McKenzie
>            Priority: Normal
>
> We should tune the parallel parameters for our circleci config to be more 
> optimal. From the email / slack conversations on the topic:
> {code}
> > def java_parallelism(src_dir, kind, num_file_in_worker, include = lambda a, 
> > b: True):
> >     d = os.path.join(src_dir, 'test', kind)
> >     num_files = 0
> >     for root, dirs, files in os.walk(d):
> >         for f in files:
> >             if f.endswith('Test.java') and include(os.path.join(root, f), 
> > f):
> >                 num_files += 1
> >     return math.floor(num_files / num_file_in_worker)
> > 
> > def fix_parallelism(args, contents):
> >     jobs = contents['jobs']
> > 
> >     unit_parallelism                = java_parallelism(args.src, 'unit', 20)
> >     jvm_dtest_parallelism           = java_parallelism(args.src, 
> > 'distributed', 4, lambda full, name: 'upgrade' not in full)
> >     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src, 
> > 'distributed', 2, lambda full, name: 'upgrade' in full)
> {code}
> bq. `TL;DR - I find all test files we are going to run, and based off a 
> pre-defined variable that says “idea” number of files per worker, I then 
> calculate how many workers we need.  So unit tests are num_files / 20 ~= 35 
> workers.  Can I be “smarter” by knowing which files have higher cost?  Sure… 
> but the “perfect” and the “average” are too similar that it wasn’t worth 
> it...`
> Quoting [~dcapwell]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-18009) Tune parallelism for circleci jobs

Reply via email to