[
https://issues.apache.org/jira/browse/CASSANDRA-12784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571222#comment-15571222
]
Stefania commented on CASSANDRA-12784:
--------------------------------------
This test has been timing out since it was introduced by CASSANDRA-12647. In
fact, the problem is not {{testNewClusterWithMurmur3Partitioner}}, which was
previously called {{testNewCluster}} and was timing out very rarely (due to the
flaky utility?), but the new random partitioner tests. They take approximately
twice as long as the murmur3 tests. On my laptop, with the current test
configuration of 64 VNODES, the full test completes in approximately 500
seconds, see timings below. The total timeout on Jenkins is 600 seconds,
therefore the test is almost always timing out given that the Jenkins VMs are
slower. This analysis does not take into account the flaky utility, when this
latter kicks in, there is no chance that the test completes within the timeout.
I've attached a JFR profile, [^ReplicationAwareTokenAllocatorTest.jfr.gz], the
slowness of the random partitioner tests is due to the big integer math in
{{BigIntegerToken.size()}}. Unless we plan on improving the performance of the
algorithm or of the big integer math, may I suggest reducing the scope of the
test? I don't think it's reasonable to run a unit test that takes longer than
10 minutes, the full test can be moved to a burn test if required.
One way to reduce the scope would be to reduce the number of iterations by
reducing VNODES, do you have any other suggestions [~blambov] or [~dikanggu]?
h5. Measurements on my laptop:
*64-VNODES:*
{code}
<testsuite errors="0" failures="0" hostname="cuoricina"
name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
skipped="0" tests="4" time="483.197" timestamp="2016-10-13T04:18:10">
<properties>...</properties>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testExistingClusterWithRandomPartitioner" time="101.944"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testExistingClusterWithMurmur3Partitioner" time="51.654"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testNewClusterWithRandomPartitioner" time="218.88"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testNewClusterWithMurmur3Partitioner" time="110.613"/>
<system-out>...</system-out>
<system-err>
<![CDATA[ ]]>
</system-err>
</testsuite>
{code}
*32-VNODES:*
{code}
<testsuite errors="0" failures="0" hostname="cuoricina"
name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
skipped="0" tests="4" time="87.773" timestamp="2016-10-13T07:09:46">
<properties>...</properties>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testExistingClusterWithMurmur3Partitioner" time="9.548"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testNewClusterWithMurmur3Partitioner" time="18.18"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testExistingClusterWithRandomPartitioner" time="23.235"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testNewClusterWithRandomPartitioner" time="36.709"/>
<system-out>...</system-out>
<system-err>
<![CDATA[ ]]>
</system-err>
</testsuite>
{code}
*16-VNODES:*
{code}
<testsuite errors="0" failures="0" hostname="cuoricina"
name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
skipped="0" tests="4" time="89.858" timestamp="2016-10-13T06:56:31">
<properties>...</properties>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testExistingClusterWithMurmur3Partitioner" time="10.067"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testNewClusterWithMurmur3Partitioner" time="17.662"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testExistingClusterWithRandomPartitioner" time="23.678"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testNewClusterWithRandomPartitioner" time="38.345"/>
<system-out>...</system-out>
<system-err>
<![CDATA[ ]]>
</system-err>
</testsuite>
{code}
*8-VNODES:*
{code}
<testsuite errors="0" failures="0" hostname="cuoricina"
name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
skipped="0" tests="4" time="22.055" timestamp="2016-10-13T06:32:58">
<properties>...</properties>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testExistingClusterWithMurmur3Partitioner" time="3.219"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testNewClusterWithMurmur3Partitioner" time="4.336"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testExistingClusterWithRandomPartitioner" time="5.747"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testNewClusterWithRandomPartitioner" time="8.629"/>
<system-out>...</system-out>
<system-err>
<![CDATA[ ]]>
</system-err>
</testsuite>
{code}
*4-VNODES:*
{code}
<testsuite errors="0" failures="0" hostname="cuoricina"
name="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
skipped="0" tests="4" time="22.55" timestamp="2016-10-13T07:21:39">
<properties>...</properties>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testExistingClusterWithMurmur3Partitioner" time="3.112"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testNewClusterWithMurmur3Partitioner" time="4.453"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testExistingClusterWithRandomPartitioner" time="6.112"/>
<testcase
classname="org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest"
name="testNewClusterWithRandomPartitioner" time="8.766"/>
<system-out>...</system-out>
<system-err>
<![CDATA[ ]]>
</system-err>
</testsuite>
{code}
--
I've also noticed two failures with the following exception:
{code}
[junit] ------------- ---------------- ---------------
[junit] Testcase:
testNewClusterWithRandomPartitioner(org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest):
FAILED
[junit] Expected max unit size below 1.2000, was 1.2241
[junit] junit.framework.AssertionFailedError: Expected max unit size below
1.2000, was 1.2241
[junit] at
org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.grow(ReplicationAwareTokenAllocatorTest.java:698)
[junit] at
org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.testNewCluster(ReplicationAwareTokenAllocatorTest.java:629)
[junit] at
org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.flakyTestNewCluster(ReplicationAwareTokenAllocatorTest.java:611)
[junit] at
org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.flakyTestNewClusterWithRandomPartitioner(ReplicationAwareTokenAllocatorTest.java:583)
[junit] at
org.apache.cassandra.Util.runCatchingAssertionError(Util.java:576)
[junit] at org.apache.cassandra.Util.flakyTest(Util.java:601)
[junit] at
org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocatorTest.testNewClusterWithRandomPartitioner(ReplicationAwareTokenAllocatorTest.java:568)
{code}
Is the flaky utility effective for the random partitioner tests?
> ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and
> trunk
> --------------------------------------------------------------------------------
>
> Key: CASSANDRA-12784
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12784
> Project: Cassandra
> Issue Type: Bug
> Reporter: Stefania
> Assignee: Stefania
> Fix For: 3.x
>
> Attachments: ReplicationAwareTokenAllocatorTest.jfr.gz
>
>
> Example failure:
> http://cassci.datastax.com/view/cassandra-3.X/job/cassandra-3.X_testall/lastCompletedBuild/testReport/org.apache.cassandra.dht.tokenallocator/ReplicationAwareTokenAllocatorTest/testNewClusterWithMurmur3Partitioner/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)