[
https://issues.apache.org/jira/browse/CASSANDRA-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998789#comment-14998789
]
Paulo Motta commented on CASSANDRA-10644:
-----------------------------------------
Basically there is a 1 second gap between when the repair stream connection
handler is closed and the actual outgoing socket is gracefully closed
({{messageQueue.poll(1, TimeUnit.SECONDS))}} on {{OutgoingMessageHandler}}).
When node2 is abruptly stopped before that 1 second has passed, the incoming
socket on the other side (node3) is closed gracefully on Linux, but not on
Windows (see this StackOverflow
[thread|http://stackoverflow.com/questions/22931811/differences-on-java-sockets-between-windows-and-linux-how-to-handle-them]
for more details).
Most of the times the test does not fail because node2 is stopped after this 1
second period, so a quick and dirty fix is basically to sleep for 2 seconds on
windows before abruptly stopping node2 after a repair session. Since this is a
very specific and unlikely situation I think it's enough to address this only
in the dtest. WDYT [~yukim]?
Created [dtest PR|https://github.com/riptano/cassandra-dtest/pull/654] with
quick and dirty fix.
> multiple repair dtest fails under Windows
> -----------------------------------------
>
> Key: CASSANDRA-10644
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10644
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Jim Witschey
> Assignee: Paulo Motta
> Fix For: 3.1, 2.2.x
>
>
> {{incremental_repair_test.py:TestIncRepair.multiple_repair_test}} flaps on
> CassCI Windows runs on C* 3.0:
> http://cassci.datastax.com/view/win32/job/cassandra-3.0_dtest_win32/100/testReport/junit/incremental_repair_test/TestIncRepair/multiple_repair_test/history/
> The error is {{An existing connection was forcibly closed by the remote
> host}}, and happens consistently in the failing runs:
> http://cassci.datastax.com/view/win32/job/cassandra-3.0_dtest_win32/100/testReport/junit/incremental_repair_test/TestIncRepair/multiple_repair_test/
> http://cassci.datastax.com/view/win32/job/cassandra-3.0_dtest_win32/72/testReport/junit/incremental_repair_test/TestIncRepair/multiple_repair_test/
> [~yukim] Can you have a look? I feel like you're more likely than anyone else
> to understand the streaming error. In particular: is this what happens when a
> node goes down? This could be an environment error, rather than a C* bug.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)