[ 
https://issues.apache.org/jira/browse/CASSANDRA-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654369#comment-14654369
 ] 

Paulo Motta commented on CASSANDRA-9983:
----------------------------------------

* The dropped mutations error was happening because the test was too aggressive 
with unthrottled compaction + 50 stress threads. This error was also happening 
[on 
Linux|http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/incremental_repair_test/TestIncRepair/multiple_subsequent_repair_test/].
 Since load testing is not the objective of this test, I disabled the 
compaction unthrottling and reduced the number of stress threads to 10, which 
may make the test slower, but at least it'll pass.
** There are other dtests failing for the same reason on both Linux and 
Windows, so a better long term solution would probably be integrating stress 
with ccm in a way that doesn't cause dropped mutation errors when running 
multiple nodes in the same physical box.
* The Java Heap Space problem was due to heavy compactions being executed 
concurrently with incremental repairs. I didn't find the root cause of the 
issue, but waiting for compactions to finish before triggering repair resolved 
the problem .
* The leak reference problem was a consequence of the previous problems.

The cassandra-dtest fix is on this 
[PR|https://github.com/riptano/cassandra-dtest/pull/441]. It depends on this 
[CCM PR|https://github.com/pcmanus/ccm/pull/346].

> Windows dtest: 
> incremental_repair_test.py:TestIncRepair.multiple_subsequent_repair_test
> ---------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9983
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9983
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>            Priority: Minor
>             Fix For: 2.2.x
>
>
> This test was failing on the lastest cassci builds 
> ([#44|http://cassci.datastax.com/view/win32/job/cassandra-2.2_dtest_win32/44/testReport/junit/incremental_repair_test/TestIncRepair/multiple_subsequent_repair_test_2/]
>  and 
> [#38|http://cassci.datastax.com/view/win32/job/cassandra-2.2_dtest_win32/38/testReport/junit/incremental_repair_test/TestIncRepair/multiple_subsequent_repair_test_2/]).
>  It also made the whole suite hang sometimes. For this reason, it was 
> disabled from windows dtests on [this 
> commit|https://github.com/riptano/cassandra-dtest/commit/478731bc830b62453a3b3996bf9dd0bfd1f6c2a6].
> Below are some stack traces:
> {noformat}
> ERROR [ScheduledTasks:1] 2015-08-04 16:49:30,865 MessagingService.java:894 - 
> MUTATION messages were dropped in last 5000 ms: 112 for internal timeout and 
> 0 for cross node timeout
> {noformat}
> {noformat}
> ERROR [STREAM-OUT-/127.0.0.3] 2015-08-04 16:51:01,820 StreamSession.java:521 
> - [Stream #96593730-3ae1-11e5-b4d0-358ed15ecf72] Streaming error occurred
> java.lang.OutOfMemoryError: Java heap space
>         at 
> com.ning.compress.lzf.BufferRecycler.allocOutputBuffer(BufferRecycler.java:79)
>  ~[compress-lzf-0.8.4.jar:na]
>         at 
> com.ning.compress.lzf.LZFOutputStream.<init>(LZFOutputStream.java:50) 
> ~[compress-lzf-0.8.4.jar:na]
>         at 
> org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:83) 
> ~[main/:na]
>         at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:96)
>  ~[main/:na]
>         at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:48)
>  ~[main/:na]
>         at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:40)
>  ~[main/:na]
>         at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:46)
>  ~[main/:na]
>         at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:363)
>  ~[main/:na]
>         at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:335)
>  ~[main/:na]
> --
> {noformat}
> {noformat}
> ERROR [Reference-Reaper:1] 2015-08-04 16:51:42,663 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@344b91f8) to class 
> org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1718063907:Memory@[75039b10..75039b14)
>  was not released before the reference was garbage collected
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to