[jira] [Commented] (CASSANDRA-12784) ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and trunk

Stefania (JIRA) Thu, 13 Oct 2016 22:54:32 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574329#comment-15574329
 ]


Stefania commented on CASSANDRA-12784:
--------------------------------------

These are the new timings on my laptop with 64 vnodes for Murmur3 and 16 vnodes 
for Random:

{code}
<testsuite errors="0" failures="0" hostname="cuoricina" 
name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 skipped="0" tests="4" time="216.687" timestamp="2016-10-14T02:23:57">
<properties>...</properties>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testExistingClusterWithMurmur3Partitioner" time="44.718"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testNewClusterWithMurmur3Partitioner" time="90.62"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testExistingClusterWithRandomPartitioner" time="24.512"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testNewClusterWithRandomPartitioner" time="56.712"/>
<system-out>...</system-out>
<system-err>
<![CDATA[ ]]>
</system-err>
</testsuite>
{code}

Unfortunately Jenkins is much slower. I multiplexed the test 10 times 
[here|https://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-testall-multiplex/25/testReport/org.apache.cassandra.dht.tokenallocator/]
 and each iteration took approximately 5.5 minutes. There were no failures, so 
I could not reproduce the problem mentioned above.

Because {{testNewClusterWithMurmur3Partitioner}} takes 2.5 minutes on Jenkins, 
a flaky test will timeout if it performs more than 2 additional runs, so I 
changed the iterations to 2 for Murmur3 and 3 for Random. I've also split the 
class into two sub-classes, so that the total limit of 10 minutes is doubled. 
Otherwise, if both tests are flaky in the same run, it will certainly timeout, 
and even if only {{testNewClusterWithMurmur3Partitioner}} is flaky, the total 
time is very close to the 10 minutes limit.

This is the full patch:

||3.X||trunk||
|[patch|https://github.com/stef1927/cassandra/commits/12784-3.X]|[patch|https://github.com/stef1927/cassandra/commits/12784]|
|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12784-3.X-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12784-testall/]|

I'm also multiplexing 50 times the random partitioner tests 
[here|https://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-testall-multiplex/26/],
 to see if we can reproduce any failures despite the flaky utility.

[~blambov], [~dikanggu]: who wants to be the reviewer?

> ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and 
> trunk
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12784
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12784
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.x
>
>         Attachments: ReplicationAwareTokenAllocatorTest.jfr.gz
>
>
> Example failure: 
> http://cassci.datastax.com/view/cassandra-3.X/job/cassandra-3.X_testall/lastCompletedBuild/testReport/org.apache.cassandra.dht.tokenallocator/ReplicationAwareTokenAllocatorTest/testNewClusterWithMurmur3Partitioner/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12784) ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and trunk

Reply via email to