[ https://issues.apache.org/jira/browse/CASSANDRA-12784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571222#comment-15571222 ]
Stefania commented on CASSANDRA-12784: -------------------------------------- This test has been timing out since it was introduced by CASSANDRA-12647. In fact, the problem is not {{testNewClusterWithMurmur3Partitioner}}, which was previously called {{testNewCluster}} and was timing out very rarely (due to the flaky utility?), but the new random partitioner tests. They take approximately twice as long as the murmur3 tests. On my laptop, with the current test configuration of 64 VNODES, the full test completes in approximately 500 seconds, see timings below. The total timeout on Jenkins is 600 seconds, therefore the test is almost always timing out given that the Jenkins VMs are slower. This analysis does not take into account the flaky utility, when this latter kicks in, there is no chance that the test completes within the timeout. I've attached a JFR profile, [^ReplicationAwareTokenAllocatorTest.jfr.gz], the slowness of the random partitioner tests is due to the big integer math in {{BigIntegerToken.size()}}. Unless we plan on improving the performance of the algorithm or of the big integer math, may I suggest reducing the scope of the test? I don't think it's reasonable to run a unit test that takes longer than 10 minutes, the full test can be moved to a burn test if required. One way to reduce the scope would be to reduce the number of iterations by reducing VNODES, do you have any other suggestions [~blambov] or [~dikanggu]? h5. Measurements on my laptop: *64-VNODES:* {code} <testsuite errors="0" failures="0" hostname="cuoricina" name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" skipped="0" tests="4" time="483.197" timestamp="2016-10-13T04:18:10"> <properties>...</properties> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testExistingClusterWithRandomPartitioner" time="101.944"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testExistingClusterWithMurmur3Partitioner" time="51.654"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testNewClusterWithRandomPartitioner" time="218.88"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testNewClusterWithMurmur3Partitioner" time="110.613"/> <system-out>...</system-out> <system-err> <![CDATA[ ]]> </system-err> </testsuite> {code} *32-VNODES:* {code} <testsuite errors="0" failures="0" hostname="cuoricina" name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" skipped="0" tests="4" time="87.773" timestamp="2016-10-13T07:09:46"> <properties>...</properties> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testExistingClusterWithMurmur3Partitioner" time="9.548"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testNewClusterWithMurmur3Partitioner" time="18.18"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testExistingClusterWithRandomPartitioner" time="23.235"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testNewClusterWithRandomPartitioner" time="36.709"/> <system-out>...</system-out> <system-err> <![CDATA[ ]]> </system-err> </testsuite> {code} *16-VNODES:* {code} <testsuite errors="0" failures="0" hostname="cuoricina" name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" skipped="0" tests="4" time="89.858" timestamp="2016-10-13T06:56:31"> <properties>...</properties> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testExistingClusterWithMurmur3Partitioner" time="10.067"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testNewClusterWithMurmur3Partitioner" time="17.662"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testExistingClusterWithRandomPartitioner" time="23.678"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testNewClusterWithRandomPartitioner" time="38.345"/> <system-out>...</system-out> <system-err> <![CDATA[ ]]> </system-err> </testsuite> {code} *8-VNODES:* {code} <testsuite errors="0" failures="0" hostname="cuoricina" name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" skipped="0" tests="4" time="22.055" timestamp="2016-10-13T06:32:58"> <properties>...</properties> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testExistingClusterWithMurmur3Partitioner" time="3.219"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testNewClusterWithMurmur3Partitioner" time="4.336"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testExistingClusterWithRandomPartitioner" time="5.747"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testNewClusterWithRandomPartitioner" time="8.629"/> <system-out>...</system-out> <system-err> <![CDATA[ ]]> </system-err> </testsuite> {code} *4-VNODES:* {code} <testsuite errors="0" failures="0" hostname="cuoricina" name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" skipped="0" tests="4" time="22.55" timestamp="2016-10-13T07:21:39"> <properties>...</properties> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testExistingClusterWithMurmur3Partitioner" time="3.112"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testNewClusterWithMurmur3Partitioner" time="4.453"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testExistingClusterWithRandomPartitioner" time="6.112"/> <testcase classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest" name="testNewClusterWithRandomPartitioner" time="8.766"/> <system-out>...</system-out> <system-err> <![CDATA[ ]]> </system-err> </testsuite> {code} -- I've also noticed two failures with the following exception: {code} [junit] ------------- ---------------- --------------- [junit] Testcase: testNewClusterWithRandomPartitioner(org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest): FAILED [junit] Expected max unit size below 1.2000, was 1.2241 [junit] junit.framework.AssertionFailedError: Expected max unit size below 1.2000, was 1.2241 [junit] at org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.grow(ReplicationAwareTokenAllocatorTest.java:698) [junit] at org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.testNewCluster(ReplicationAwareTokenAllocatorTest.java:629) [junit] at org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.flakyTestNewCluster(ReplicationAwareTokenAllocatorTest.java:611) [junit] at org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.flakyTestNewClusterWithRandomPartitioner(ReplicationAwareTokenAllocatorTest.java:583) [junit] at org.apache.cassandra.Util.runCatchingAssertionError(Util.java:576) [junit] at org.apache.cassandra.Util.flakyTest(Util.java:601) [junit] at org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.testNewClusterWithRandomPartitioner(ReplicationAwareTokenAllocatorTest.java:568) {code} Is the flaky utility effective for the random partitioner tests? > ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and > trunk > -------------------------------------------------------------------------------- > > Key: CASSANDRA-12784 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12784 > Project: Cassandra > Issue Type: Bug > Reporter: Stefania > Assignee: Stefania > Fix For: 3.x > > Attachments: ReplicationAwareTokenAllocatorTest.jfr.gz > > > Example failure: > http://cassci.datastax.com/view/cassandra-3.X/job/cassandra-3.X_testall/lastCompletedBuild/testReport/org.apache.cassandra.dht.tokenallocator/ReplicationAwareTokenAllocatorTest/testNewClusterWithMurmur3Partitioner/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)