[ 
https://issues.apache.org/jira/browse/CASSANDRA-20311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-20311:
-------------------------------------------
    Description: 
Using CASSANDRA-20157, analyse 
#  what splits are problematic
#  what test types are configured with too many splits, or not enough


*Analyse Process*
The data could be graphed, for better visualisation, and identification of 
mean, stddev, and trends.  But the following poor man's approach is fine for 
first pass.

Download the jenkins consoleText logs we want to analyse
{noformat}
for i in $(seq $first_build $last_build) ; do 
  wget -q https://ci-cassandra.apache.org/job/Cassandra-5.0/$i/consoleText 
  bash -c "grep '] Time ' consoleText | awk -F']' '{print $2}' | sort -u > 
Cassandra-5.0_${i}_timings.txt"
  rm consoleText
done
{noformat}
Alternative, if using nightlies.a.o webdav mount (the consoleText.xz files have 
been manually downloaded and put into place).
{noformat}
for i in $(seq $first_build $last_build) ; do 
  bash -c "xzgrep '] Time ' 
<nightlies_webdav_mount>/cassandra/Cassandra-5.0/${i}/consoleText.xz | awk 
-F']' '{print $2}' | sort -u > Cassandra-5.0_${i}_timings.txt"
done
{noformat}

(1)
{code}
for i in $(seq $first_build $last_build) ; do grep " 01:0" 
Cassandra-5.0_${i}_timings.txt | awk '{print $3}' ; done | sort | uniq -c | 
sort -rn
{code}
Target those individual splits that have timed out (duration of 1 hour) the 
most.

(2a)
For test types that have too few splits (are timing out at the 1 hour mark too 
often).
{code}
for i in $(seq $first_build $last_build) ; do grep " 01:0" 
Cassandra-5.0_${i}_timings.txt | awk '{print $3}' ; done | awk -F_ '{print $1}' 
| sort | uniq -c | sort -rn
{code}

(2b)
For test types that have too many splits (are finishing faster than ten 
minutes).
{code}
for i in $(seq $first_build $last_build) ; do grep " 00:0" 
Cassandra-5.0_${i}_timings.txt | awk '{print $3}' ; done | awk -F_ '{print $1}' 
| sort | uniq -c | sort -rn
{code}
This output will be weighted by those test types that have more splits (but 
that's ok because that's where we can save time / improve throughput most).

Replace "Cassandra-5.0_${i}_timings.txt" with 
"Cassandra-trunk_${i}_timings.txt" for trunk analysis, naturally.

*Results 5.0*
builds 352 - 385
(1)
{noformat}
   5 test-cdc_jdk11_python_3.8_no_amd64_2_8:
   4 test-latest_jdk17_python_3.8_no_amd64_2_8:
   4 dtest-upgrade_jdk11_python_3.8_no_amd64_40_64:
   4 dtest-upgrade_jdk11_python_3.8_no_amd64_37_64:
   3 test-system-keyspace-directory_jdk11_python_3.8_no_amd64_7_8:
   3 test-oa_jdk17_python_3.8_no_amd64_5_8:
   3 test-oa_jdk11_python_3.8_no_amd64_3_8:
   3 test-compression_jdk17_python_3.8_no_amd64_8_8:
   3 test-compression_jdk17_python_3.8_no_amd64_7_8:
   3 test-compression_jdk17_python_3.8_no_amd64_2_8:
   3 test-cdc_jdk11_python_3.8_no_amd64_4_8:
   3 dtest-upgrade_jdk11_python_3.8_no_amd64_54_64:
   3 dtest-upgrade_jdk11_python_3.8_no_amd64_53_64:
   3 dtest-upgrade_jdk11_python_3.8_no_amd64_28_64:
   3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_10_64:
   3 dtest-upgrade-large_jdk11_python_3.8_no_amd64_12_32:
…
{noformat}
(format here is 
`<test_type>\-<jdk>\-<python>\-<cython>\-<arch>\-<split>\-<splits>:`)

(2a)
{noformat}
  54 dtest-upgrade
  43 dtest-upgrade-novnode
  18 test-oa
  18 test-compression
  15 test-cdc
  15 dtest-upgrade-large
  14 test-system-keyspace-directory
  14 test-latest
  14 dtest-upgrade-novnode-large
…
{noformat}
(2b)
{noformat}
 535 dtest-latest
 495 dtest
 341 long-test
 306 dtest-novnode
 167 jvm-dtest-upgrade
 102 cqlsh-test
  90 dtest-upgrade-large
  90 dtest-large-latest
  89 dtest-large
  77 dtest-upgrade-novnode-large
  49 stress-test
  49 fqltool-test
  34 dtest-large-novnode
  24 simulator-dtest
…
{noformat}

*Results trunk*
builds 1988 - 2021
(1)
{noformat}
  11 test-compression_jdk11_python_3.8_no_amd64_2_8:
  10 test-system-keyspace-directory_jdk17_python_3.8_no_amd64_2_8:
   9 dtest-upgrade_jdk11_python_3.8_no_amd64_1_64:
   7 test_jdk17_python_3.8_no_amd64_4_8:
   6 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_42_64:
   4 test-oa_jdk17_python_3.8_no_amd64_7_8:
   4 dtest-upgrade_jdk11_python_3.8_no_amd64_4_64:
   3 test_jdk17_python_3.8_no_amd64_6_8:
   3 test_jdk11_python_3.8_no_amd64_8_8:
   3 test-oa_jdk11_python_3.8_no_amd64_2_8:
   3 test-compression_jdk17_python_3.8_no_amd64_3_8:
   3 dtest-upgrade_jdk11_python_3.8_no_amd64_63_64:
   3 dtest-upgrade_jdk11_python_3.8_no_amd64_55_64:
   3 dtest-upgrade_jdk11_python_3.8_no_amd64_36_64:
   3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_3_64:
   3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_37_64:
   3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_30_64:
   3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_2_64:
   3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_14_64:
   3 dtest-upgrade-novnode-large_jdk11_python_3.8_no_amd64_7_32:
   3 dtest-upgrade-large_jdk11_python_3.8_no_amd64_19_32:
   3 dtest-upgrade-large_jdk11_python_3.8_no_amd64_17_32:
…
{noformat}
(2a)
{noformat}
  66 dtest-upgrade
  62 dtest-upgrade-novnode
  23 test-system-keyspace-directory
  23 test
  22 test-compression
  18 test-oa
  17 test-latest
  17 dtest-upgrade-large
  14 jvm-dtest
  14 dtest-upgrade-novnode-large
…
{noformat}
(2b)
{noformat}
 334 long-test
 298 dtest-latest
 291 dtest
 185 dtest-novnode
 159 dtest-large-latest
 151 dtest-large
  98 cqlsh-test
  86 dtest-upgrade-large
  85 dtest-upgrade-novnode-large
  52 dtest-large-novnode
  51 stress-test
  51 fqltool-test
  45 jvm-dtest-upgrade
  25 simulator-dtest
…
{noformat}

*Observations*

- dtest-upgrade (+variants) needs more splits
- test (+variants) needs more splits
- dtest (+variants) could reduce splits
   -  taking a closer look, the spread up to the one hour timeout value makes 
this difficult to change
- long-test could reduce splits
- dtest-large (+variants) could reduce splits
- jvm-dtest-upgrade in 5.0 could reduce splits
- dtest-upgrade-large (+variants) have a lot of variability (both <10 minutes 
and >1 hour)



  was:
Using CASSANDRA-20157, analyse 
#  what splits are problematic
#  what test types are configured with too many splits, or not enough


*Analyse Process*
The data could be graphed, for better visualisation, and identification of 
mean, stddev, and trends.  But the following poor man's approach is fine for 
first pass.

Download the jenkins consoleText logs we want to analyse
{noformat}
for i in $(seq $first_build $last_build) ; do 
  wget -q https://ci-cassandra.apache.org/job/Cassandra-5.0/$i/consoleText 
  bash -c "grep '] Time ' consoleText | awk -F']' '{print $2}' | sort -u > 
Cassandra-5.0_${i}_timings.txt"
  rm consoleText
done
{noformat}
Alternative, if using nightlies.a.o webdav mount (the consoleText.xz files have 
been manually downloaded and put into place).
{noformat}
for i in $(seq $first_build $last_build) ; do 
  bash -c "xzgrep '] Time ' 
<nightlies_webdav_mount>/cassandra/Cassandra-5.0/${i}/consoleText.xz | awk 
-F']' '{print $2}' | sort -u > Cassandra-5.0_${i}_timings.txt"
done
{noformat}

(1)
{code}
for i in $(seq $first_build $last_build) ; do grep " 01:0" 
Cassandra-5.0_${i}_timings.txt | awk '{print $3}' ; done | sort | uniq -c | 
sort -rn
{code}
Target those individual splits that have timed out (duration of 1 hour) the 
most.

(2a)
For test types that have too few splits (are timing out at the 1 hour mark too 
often).
{code}
for i in $(seq $first_build $last_build) ; do grep " 01:0" 
Cassandra-5.0_${i}_timings.txt | awk '{print $3}' ; done | awk -F_ '{print $1}' 
| sort | uniq -c | sort -rn
{code}

(2b)
For test types that have too many splits (are finishing faster than ten 
minutes).
{code}
for i in $(seq $first_build $last_build) ; do grep " 00:0" 
Cassandra-5.0_${i}_timings.txt | awk '{print $3}' ; done | awk -F_ '{print $1}' 
| sort | uniq -c | sort -rn
{code}
This output will be weighted by those test types that have more splits (but 
that's ok because that's where we can save time / improve throughput most).

Replace "Cassandra-5.0_${i}_timings.txt" with 
"Cassandra-trunk_${i}_timings.txt" for trunk analysis, naturally.

*Results 5.0*
builds 352 - 385
(1)
{noformat}
   5 test-cdc_jdk11_python_3.8_no_amd64_2_8:
   4 test-latest_jdk17_python_3.8_no_amd64_2_8:
   4 dtest-upgrade_jdk11_python_3.8_no_amd64_40_64:
   4 dtest-upgrade_jdk11_python_3.8_no_amd64_37_64:
   3 test-system-keyspace-directory_jdk11_python_3.8_no_amd64_7_8:
   3 test-oa_jdk17_python_3.8_no_amd64_5_8:
   3 test-oa_jdk11_python_3.8_no_amd64_3_8:
   3 test-compression_jdk17_python_3.8_no_amd64_8_8:
   3 test-compression_jdk17_python_3.8_no_amd64_7_8:
   3 test-compression_jdk17_python_3.8_no_amd64_2_8:
   3 test-cdc_jdk11_python_3.8_no_amd64_4_8:
   3 dtest-upgrade_jdk11_python_3.8_no_amd64_54_64:
   3 dtest-upgrade_jdk11_python_3.8_no_amd64_53_64:
   3 dtest-upgrade_jdk11_python_3.8_no_amd64_28_64:
   3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_10_64:
   3 dtest-upgrade-large_jdk11_python_3.8_no_amd64_12_32:
…
{noformat}
(format here is 
`<test_type>\-<jdk>\-<python>\-<cython>\-<arch>\-<split>\-<splits>:`)

(2a)
{noformat}
  54 dtest-upgrade
  43 dtest-upgrade-novnode
  18 test-oa
  18 test-compression
  15 test-cdc
  15 dtest-upgrade-large
  14 test-system-keyspace-directory
  14 test-latest
  14 dtest-upgrade-novnode-large
…
{noformat}
(2b)
{noformat}
 535 dtest-latest
 495 dtest
 341 long-test
 306 dtest-novnode
 167 jvm-dtest-upgrade
 102 cqlsh-test
  90 dtest-upgrade-large
  90 dtest-large-latest
  89 dtest-large
  77 dtest-upgrade-novnode-large
  49 stress-test
  49 fqltool-test
  34 dtest-large-novnode
  24 simulator-dtest
…
{noformat}

*Results trunk*
builds 1988 - 2021
(1)
{noformat}
  11 test-compression_jdk11_python_3.8_no_amd64_2_8:
  10 test-system-keyspace-directory_jdk17_python_3.8_no_amd64_2_8:
   9 dtest-upgrade_jdk11_python_3.8_no_amd64_1_64:
   7 test_jdk17_python_3.8_no_amd64_4_8:
   6 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_42_64:
   4 test-oa_jdk17_python_3.8_no_amd64_7_8:
   4 dtest-upgrade_jdk11_python_3.8_no_amd64_4_64:
   3 test_jdk17_python_3.8_no_amd64_6_8:
   3 test_jdk11_python_3.8_no_amd64_8_8:
   3 test-oa_jdk11_python_3.8_no_amd64_2_8:
   3 test-compression_jdk17_python_3.8_no_amd64_3_8:
   3 dtest-upgrade_jdk11_python_3.8_no_amd64_63_64:
   3 dtest-upgrade_jdk11_python_3.8_no_amd64_55_64:
   3 dtest-upgrade_jdk11_python_3.8_no_amd64_36_64:
   3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_3_64:
   3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_37_64:
   3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_30_64:
   3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_2_64:
   3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_14_64:
   3 dtest-upgrade-novnode-large_jdk11_python_3.8_no_amd64_7_32:
   3 dtest-upgrade-large_jdk11_python_3.8_no_amd64_19_32:
   3 dtest-upgrade-large_jdk11_python_3.8_no_amd64_17_32:
…
{noformat}
(2a)
{noformat}
  66 dtest-upgrade
  62 dtest-upgrade-novnode
  23 test-system-keyspace-directory
  23 test
  22 test-compression
  18 test-oa
  17 test-latest
  17 dtest-upgrade-large
  14 jvm-dtest
  14 dtest-upgrade-novnode-large
…
{noformat}
(2b)
{noformat}
 334 long-test
 298 dtest-latest
 291 dtest
 185 dtest-novnode
 159 dtest-large-latest
 151 dtest-large
  98 cqlsh-test
  86 dtest-upgrade-large
  85 dtest-upgrade-novnode-large
  52 dtest-large-novnode
  51 stress-test
  51 fqltool-test
  45 jvm-dtest-upgrade
  25 simulator-dtest
…
{noformat}

*Observations*

- dtest-upgrade (+variants) needs more splits
- test (+variants) needs more splits
- dtest (+variants) could reduce splits
- long-test could reduce splits
- dtest-large (+variants) could reduce splits
- jvm-dtest-upgrade in 5.0 could reduce splits
- dtest-upgrade-large (+variants) have a lot of variability (both <10 minutes 
and >1 hour)




> Adjust  5.0 and trunk Jenkinsfile's splits configuration
> --------------------------------------------------------
>
>                 Key: CASSANDRA-20311
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20311
>             Project: Apache Cassandra
>          Issue Type: Task
>          Components: CI
>            Reporter: Michael Semb Wever
>            Assignee: Michael Semb Wever
>            Priority: Normal
>
> Using CASSANDRA-20157, analyse 
> #  what splits are problematic
> #  what test types are configured with too many splits, or not enough
> *Analyse Process*
> The data could be graphed, for better visualisation, and identification of 
> mean, stddev, and trends.  But the following poor man's approach is fine for 
> first pass.
> Download the jenkins consoleText logs we want to analyse
> {noformat}
> for i in $(seq $first_build $last_build) ; do 
>   wget -q https://ci-cassandra.apache.org/job/Cassandra-5.0/$i/consoleText 
>   bash -c "grep '] Time ' consoleText | awk -F']' '{print $2}' | sort -u > 
> Cassandra-5.0_${i}_timings.txt"
>   rm consoleText
> done
> {noformat}
> Alternative, if using nightlies.a.o webdav mount (the consoleText.xz files 
> have been manually downloaded and put into place).
> {noformat}
> for i in $(seq $first_build $last_build) ; do 
>   bash -c "xzgrep '] Time ' 
> <nightlies_webdav_mount>/cassandra/Cassandra-5.0/${i}/consoleText.xz | awk 
> -F']' '{print $2}' | sort -u > Cassandra-5.0_${i}_timings.txt"
> done
> {noformat}
> (1)
> {code}
> for i in $(seq $first_build $last_build) ; do grep " 01:0" 
> Cassandra-5.0_${i}_timings.txt | awk '{print $3}' ; done | sort | uniq -c | 
> sort -rn
> {code}
> Target those individual splits that have timed out (duration of 1 hour) the 
> most.
> (2a)
> For test types that have too few splits (are timing out at the 1 hour mark 
> too often).
> {code}
> for i in $(seq $first_build $last_build) ; do grep " 01:0" 
> Cassandra-5.0_${i}_timings.txt | awk '{print $3}' ; done | awk -F_ '{print 
> $1}' | sort | uniq -c | sort -rn
> {code}
> (2b)
> For test types that have too many splits (are finishing faster than ten 
> minutes).
> {code}
> for i in $(seq $first_build $last_build) ; do grep " 00:0" 
> Cassandra-5.0_${i}_timings.txt | awk '{print $3}' ; done | awk -F_ '{print 
> $1}' | sort | uniq -c | sort -rn
> {code}
> This output will be weighted by those test types that have more splits (but 
> that's ok because that's where we can save time / improve throughput most).
> Replace "Cassandra-5.0_${i}_timings.txt" with 
> "Cassandra-trunk_${i}_timings.txt" for trunk analysis, naturally.
> *Results 5.0*
> builds 352 - 385
> (1)
> {noformat}
>    5 test-cdc_jdk11_python_3.8_no_amd64_2_8:
>    4 test-latest_jdk17_python_3.8_no_amd64_2_8:
>    4 dtest-upgrade_jdk11_python_3.8_no_amd64_40_64:
>    4 dtest-upgrade_jdk11_python_3.8_no_amd64_37_64:
>    3 test-system-keyspace-directory_jdk11_python_3.8_no_amd64_7_8:
>    3 test-oa_jdk17_python_3.8_no_amd64_5_8:
>    3 test-oa_jdk11_python_3.8_no_amd64_3_8:
>    3 test-compression_jdk17_python_3.8_no_amd64_8_8:
>    3 test-compression_jdk17_python_3.8_no_amd64_7_8:
>    3 test-compression_jdk17_python_3.8_no_amd64_2_8:
>    3 test-cdc_jdk11_python_3.8_no_amd64_4_8:
>    3 dtest-upgrade_jdk11_python_3.8_no_amd64_54_64:
>    3 dtest-upgrade_jdk11_python_3.8_no_amd64_53_64:
>    3 dtest-upgrade_jdk11_python_3.8_no_amd64_28_64:
>    3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_10_64:
>    3 dtest-upgrade-large_jdk11_python_3.8_no_amd64_12_32:
> …
> {noformat}
> (format here is 
> `<test_type>\-<jdk>\-<python>\-<cython>\-<arch>\-<split>\-<splits>:`)
> (2a)
> {noformat}
>   54 dtest-upgrade
>   43 dtest-upgrade-novnode
>   18 test-oa
>   18 test-compression
>   15 test-cdc
>   15 dtest-upgrade-large
>   14 test-system-keyspace-directory
>   14 test-latest
>   14 dtest-upgrade-novnode-large
> …
> {noformat}
> (2b)
> {noformat}
>  535 dtest-latest
>  495 dtest
>  341 long-test
>  306 dtest-novnode
>  167 jvm-dtest-upgrade
>  102 cqlsh-test
>   90 dtest-upgrade-large
>   90 dtest-large-latest
>   89 dtest-large
>   77 dtest-upgrade-novnode-large
>   49 stress-test
>   49 fqltool-test
>   34 dtest-large-novnode
>   24 simulator-dtest
> …
> {noformat}
> *Results trunk*
> builds 1988 - 2021
> (1)
> {noformat}
>   11 test-compression_jdk11_python_3.8_no_amd64_2_8:
>   10 test-system-keyspace-directory_jdk17_python_3.8_no_amd64_2_8:
>    9 dtest-upgrade_jdk11_python_3.8_no_amd64_1_64:
>    7 test_jdk17_python_3.8_no_amd64_4_8:
>    6 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_42_64:
>    4 test-oa_jdk17_python_3.8_no_amd64_7_8:
>    4 dtest-upgrade_jdk11_python_3.8_no_amd64_4_64:
>    3 test_jdk17_python_3.8_no_amd64_6_8:
>    3 test_jdk11_python_3.8_no_amd64_8_8:
>    3 test-oa_jdk11_python_3.8_no_amd64_2_8:
>    3 test-compression_jdk17_python_3.8_no_amd64_3_8:
>    3 dtest-upgrade_jdk11_python_3.8_no_amd64_63_64:
>    3 dtest-upgrade_jdk11_python_3.8_no_amd64_55_64:
>    3 dtest-upgrade_jdk11_python_3.8_no_amd64_36_64:
>    3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_3_64:
>    3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_37_64:
>    3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_30_64:
>    3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_2_64:
>    3 dtest-upgrade-novnode_jdk11_python_3.8_no_amd64_14_64:
>    3 dtest-upgrade-novnode-large_jdk11_python_3.8_no_amd64_7_32:
>    3 dtest-upgrade-large_jdk11_python_3.8_no_amd64_19_32:
>    3 dtest-upgrade-large_jdk11_python_3.8_no_amd64_17_32:
> …
> {noformat}
> (2a)
> {noformat}
>   66 dtest-upgrade
>   62 dtest-upgrade-novnode
>   23 test-system-keyspace-directory
>   23 test
>   22 test-compression
>   18 test-oa
>   17 test-latest
>   17 dtest-upgrade-large
>   14 jvm-dtest
>   14 dtest-upgrade-novnode-large
> …
> {noformat}
> (2b)
> {noformat}
>  334 long-test
>  298 dtest-latest
>  291 dtest
>  185 dtest-novnode
>  159 dtest-large-latest
>  151 dtest-large
>   98 cqlsh-test
>   86 dtest-upgrade-large
>   85 dtest-upgrade-novnode-large
>   52 dtest-large-novnode
>   51 stress-test
>   51 fqltool-test
>   45 jvm-dtest-upgrade
>   25 simulator-dtest
> …
> {noformat}
> *Observations*
> - dtest-upgrade (+variants) needs more splits
> - test (+variants) needs more splits
> - dtest (+variants) could reduce splits
>    -  taking a closer look, the spread up to the one hour timeout value makes 
> this difficult to change
> - long-test could reduce splits
> - dtest-large (+variants) could reduce splits
> - jvm-dtest-upgrade in 5.0 could reduce splits
> - dtest-upgrade-large (+variants) have a lot of variability (both <10 minutes 
> and >1 hour)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to