Paulo Motta created CASSANDRA-11839:
---------------------------------------
Summary: Active streams fail with SocketTimeoutException
Key: CASSANDRA-11839
URL: https://issues.apache.org/jira/browse/CASSANDRA-11839
Project: Cassandra
Issue Type: Bug
Reporter: Paulo Motta
Assignee: Paulo Motta
The original reasoning behind {{streaming_socket_timeout_in_ms}} was to kill
one-sided hanging streams (CASSANDRA-3838). This was never much of a problem
when the default was zero (never timeout).
On CASSANDRA-8611 we changed the default to 1 hour, but it was never enforced
due to CASSANDRA-11286, which was fixed recently.
On recent releases we've been receiving reports of stream failures when
streaming large files, because the sender incoming socket becomes inactive,
times out after 1 hour, and the stream session fails with
{{SocketTimeoutException}} (CASSANDRA-11345, CASSANDRA-11826), even though the
stream session is still active. The session also fails if 2i/MV rebuild takes
longer than 1 hour on the receiver (CASSANDRA-8343).
The definitive fix on trunk is to add a {{KeepAlive}} message to the stream
protocol to detect broken connections and retire
{{streaming_socket_timeout_in_ms}}. But we must also increase the default
{{streaming_socket_timeout_in_ms}} in older versions to a more conservative
value, so it is still able to detect long hanging streams.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)