[ https://issues.apache.org/jira/browse/CASSANDRA-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654369#comment-14654369 ]
Paulo Motta commented on CASSANDRA-9983: ---------------------------------------- * The dropped mutations error was happening because the test was too aggressive with unthrottled compaction + 50 stress threads. This error was also happening [on Linux|http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/incremental_repair_test/TestIncRepair/multiple_subsequent_repair_test/]. Since load testing is not the objective of this test, I disabled the compaction unthrottling and reduced the number of stress threads to 10, which may make the test slower, but at least it'll pass. ** There are other dtests failing for the same reason on both Linux and Windows, so a better long term solution would probably be integrating stress with ccm in a way that doesn't cause dropped mutation errors when running multiple nodes in the same physical box. * The Java Heap Space problem was due to heavy compactions being executed concurrently with incremental repairs. I didn't find the root cause of the issue, but waiting for compactions to finish before triggering repair resolved the problem . * The leak reference problem was a consequence of the previous problems. The cassandra-dtest fix is on this [PR|https://github.com/riptano/cassandra-dtest/pull/441]. It depends on this [CCM PR|https://github.com/pcmanus/ccm/pull/346]. > Windows dtest: > incremental_repair_test.py:TestIncRepair.multiple_subsequent_repair_test > --------------------------------------------------------------------------------------- > > Key: CASSANDRA-9983 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9983 > Project: Cassandra > Issue Type: Sub-task > Reporter: Paulo Motta > Assignee: Paulo Motta > Priority: Minor > Fix For: 2.2.x > > > This test was failing on the lastest cassci builds > ([#44|http://cassci.datastax.com/view/win32/job/cassandra-2.2_dtest_win32/44/testReport/junit/incremental_repair_test/TestIncRepair/multiple_subsequent_repair_test_2/] > and > [#38|http://cassci.datastax.com/view/win32/job/cassandra-2.2_dtest_win32/38/testReport/junit/incremental_repair_test/TestIncRepair/multiple_subsequent_repair_test_2/]). > It also made the whole suite hang sometimes. For this reason, it was > disabled from windows dtests on [this > commit|https://github.com/riptano/cassandra-dtest/commit/478731bc830b62453a3b3996bf9dd0bfd1f6c2a6]. > Below are some stack traces: > {noformat} > ERROR [ScheduledTasks:1] 2015-08-04 16:49:30,865 MessagingService.java:894 - > MUTATION messages were dropped in last 5000 ms: 112 for internal timeout and > 0 for cross node timeout > {noformat} > {noformat} > ERROR [STREAM-OUT-/127.0.0.3] 2015-08-04 16:51:01,820 StreamSession.java:521 > - [Stream #96593730-3ae1-11e5-b4d0-358ed15ecf72] Streaming error occurred > java.lang.OutOfMemoryError: Java heap space > at > com.ning.compress.lzf.BufferRecycler.allocOutputBuffer(BufferRecycler.java:79) > ~[compress-lzf-0.8.4.jar:na] > at > com.ning.compress.lzf.LZFOutputStream.<init>(LZFOutputStream.java:50) > ~[compress-lzf-0.8.4.jar:na] > at > org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:83) > ~[main/:na] > at > org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:96) > ~[main/:na] > at > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:48) > ~[main/:na] > at > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:40) > ~[main/:na] > at > org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:46) > ~[main/:na] > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:363) > ~[main/:na] > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:335) > ~[main/:na] > -- > {noformat} > {noformat} > ERROR [Reference-Reaper:1] 2015-08-04 16:51:42,663 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@344b91f8) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1718063907:Memory@[75039b10..75039b14) > was not released before the reference was garbage collected > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)