[ 
https://issues.apache.org/jira/browse/CASSANDRA-12784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571222#comment-15571222
 ] 

Stefania commented on CASSANDRA-12784:
--------------------------------------

This test has been timing out since it was introduced by CASSANDRA-12647. In 
fact, the problem is not {{testNewClusterWithMurmur3Partitioner}}, which was 
previously called {{testNewCluster}} and was timing out very rarely (due to the 
flaky utility?), but the new random partitioner tests. They take approximately 
twice as long as the murmur3 tests. On my laptop, with the current test 
configuration of 64 VNODES, the full test completes in approximately 500 
seconds, see timings below. The total timeout on Jenkins is 600 seconds, 
therefore the test is almost always timing out given that the Jenkins VMs are 
slower. This analysis does not take into account the flaky utility, when this 
latter kicks in, there is no chance that the test completes within the timeout.

I've attached a JFR profile, [^ReplicationAwareTokenAllocatorTest.jfr.gz], the 
slowness of the random partitioner tests is due to the big integer math in 
{{BigIntegerToken.size()}}. Unless we plan on improving the performance of the 
algorithm or of the big integer math, may I suggest reducing the scope of the 
test? I don't think it's reasonable to run a unit test that takes longer than 
10 minutes, the full test can be moved to a burn test if required.

One way to reduce the scope would be to reduce the number of iterations by 
reducing VNODES, do you have any other suggestions [~blambov] or [~dikanggu]?

h5. Measurements on my laptop:

*64-VNODES:*

{code}
<testsuite errors="0" failures="0" hostname="cuoricina" 
name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 skipped="0" tests="4" time="483.197" timestamp="2016-10-13T04:18:10">
<properties>...</properties>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testExistingClusterWithRandomPartitioner" time="101.944"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testExistingClusterWithMurmur3Partitioner" time="51.654"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testNewClusterWithRandomPartitioner" time="218.88"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testNewClusterWithMurmur3Partitioner" time="110.613"/>
<system-out>...</system-out>
<system-err>
<![CDATA[ ]]>
</system-err>
</testsuite>
{code}

*32-VNODES:*

{code}
<testsuite errors="0" failures="0" hostname="cuoricina" 
name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 skipped="0" tests="4" time="87.773" timestamp="2016-10-13T07:09:46">
<properties>...</properties>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testExistingClusterWithMurmur3Partitioner" time="9.548"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testNewClusterWithMurmur3Partitioner" time="18.18"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testExistingClusterWithRandomPartitioner" time="23.235"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testNewClusterWithRandomPartitioner" time="36.709"/>
<system-out>...</system-out>
<system-err>
<![CDATA[ ]]>
</system-err>
</testsuite>
{code}

*16-VNODES:*

{code}
<testsuite errors="0" failures="0" hostname="cuoricina" 
name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 skipped="0" tests="4" time="89.858" timestamp="2016-10-13T06:56:31">
<properties>...</properties>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testExistingClusterWithMurmur3Partitioner" time="10.067"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testNewClusterWithMurmur3Partitioner" time="17.662"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testExistingClusterWithRandomPartitioner" time="23.678"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testNewClusterWithRandomPartitioner" time="38.345"/>
<system-out>...</system-out>
<system-err>
<![CDATA[ ]]>
</system-err>
</testsuite>
{code}

*8-VNODES:*

{code}
<testsuite errors="0" failures="0" hostname="cuoricina" 
name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 skipped="0" tests="4" time="22.055" timestamp="2016-10-13T06:32:58">
<properties>...</properties>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testExistingClusterWithMurmur3Partitioner" time="3.219"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testNewClusterWithMurmur3Partitioner" time="4.336"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testExistingClusterWithRandomPartitioner" time="5.747"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testNewClusterWithRandomPartitioner" time="8.629"/>
<system-out>...</system-out>
<system-err>
<![CDATA[ ]]>
</system-err>
</testsuite>
{code}


*4-VNODES:*

{code}
<testsuite errors="0" failures="0" hostname="cuoricina" 
name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 skipped="0" tests="4" time="22.55" timestamp="2016-10-13T07:21:39">
<properties>...</properties>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testExistingClusterWithMurmur3Partitioner" time="3.112"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testNewClusterWithMurmur3Partitioner" time="4.453"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testExistingClusterWithRandomPartitioner" time="6.112"/>
<testcase 
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
 name="testNewClusterWithRandomPartitioner" time="8.766"/>
<system-out>...</system-out>
<system-err>
<![CDATA[ ]]>
</system-err>
</testsuite>
{code}


--

I've also noticed two failures with the following exception:

{code}
    [junit] ------------- ---------------- ---------------
    [junit] Testcase: 
testNewClusterWithRandomPartitioner(org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest):
  FAILED
    [junit] Expected max unit size below 1.2000, was 1.2241
    [junit] junit.framework.AssertionFailedError: Expected max unit size below 
1.2000, was 1.2241
    [junit]     at 
org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.grow(ReplicationAwareTokenAllocatorTest.java:698)
    [junit]     at 
org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.testNewCluster(ReplicationAwareTokenAllocatorTest.java:629)
    [junit]     at 
org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.flakyTestNewCluster(ReplicationAwareTokenAllocatorTest.java:611)
    [junit]     at 
org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.flakyTestNewClusterWithRandomPartitioner(ReplicationAwareTokenAllocatorTest.java:583)
    [junit]     at 
org.apache.cassandra.Util.runCatchingAssertionError(Util.java:576)
    [junit]     at org.apache.cassandra.Util.flakyTest(Util.java:601)
    [junit]     at 
org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.testNewClusterWithRandomPartitioner(ReplicationAwareTokenAllocatorTest.java:568)
{code}

Is the flaky utility effective for the random partitioner tests?

> ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and 
> trunk
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12784
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12784
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.x
>
>         Attachments: ReplicationAwareTokenAllocatorTest.jfr.gz
>
>
> Example failure: 
> http://cassci.datastax.com/view/cassandra-3.X/job/cassandra-3.X_testall/lastCompletedBuild/testReport/org.apache.cassandra.dht.tokenallocator/ReplicationAwareTokenAllocatorTest/testNewClusterWithMurmur3Partitioner/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to