[ 
https://issues.apache.org/jira/browse/CASSANDRA-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998789#comment-14998789
 ] 

Paulo Motta commented on CASSANDRA-10644:
-----------------------------------------

Basically there is a 1 second gap between when the repair stream connection 
handler is closed and the actual outgoing socket is gracefully closed 
({{messageQueue.poll(1, TimeUnit.SECONDS))}} on {{OutgoingMessageHandler}}). 
When node2 is abruptly stopped before that 1 second has passed, the incoming 
socket on the other side (node3) is closed gracefully on Linux, but not on 
Windows (see this StackOverflow 
[thread|http://stackoverflow.com/questions/22931811/differences-on-java-sockets-between-windows-and-linux-how-to-handle-them]
 for more details).

Most of the times the test does not fail because node2 is stopped after this 1 
second period, so a quick and dirty fix is basically to sleep for 2 seconds on 
windows before abruptly stopping node2 after a repair session. Since this is a 
very specific and unlikely situation I think it's enough to address this only 
in the dtest. WDYT [~yukim]?

Created [dtest PR|https://github.com/riptano/cassandra-dtest/pull/654] with 
quick and dirty fix.

> multiple repair dtest fails under Windows
> -----------------------------------------
>
>                 Key: CASSANDRA-10644
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10644
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Jim Witschey
>            Assignee: Paulo Motta
>             Fix For: 3.1, 2.2.x
>
>
> {{incremental_repair_test.py:TestIncRepair.multiple_repair_test}} flaps on 
> CassCI Windows runs on C* 3.0:
> http://cassci.datastax.com/view/win32/job/cassandra-3.0_dtest_win32/100/testReport/junit/incremental_repair_test/TestIncRepair/multiple_repair_test/history/
> The error is {{An existing connection was forcibly closed by the remote 
> host}}, and happens consistently in the failing runs:
> http://cassci.datastax.com/view/win32/job/cassandra-3.0_dtest_win32/100/testReport/junit/incremental_repair_test/TestIncRepair/multiple_repair_test/
> http://cassci.datastax.com/view/win32/job/cassandra-3.0_dtest_win32/72/testReport/junit/incremental_repair_test/TestIncRepair/multiple_repair_test/
> [~yukim] Can you have a look? I feel like you're more likely than anyone else 
> to understand the streaming error. In particular: is this what happens when a 
> node goes down? This could be an environment error, rather than a C* bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to