[
https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15660915#comment-15660915
]
Bing Wu edited comment on CASSANDRA-12886 at 11/13/16 5:46 AM:
---------------------------------------------------------------
[~pauloricardomg] To answer your questions
*Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed
streams is the same throughout the cluster?
*A:* Yes. More details: Not all nodes in the cluster reported SSL failure.
About 7 out of 30 nodes did. I used a combination of "java.net.SocketException:
Connection reset" and the timestamp when the "initiator" (the host that was
running repair) reported failure to search the system.log on every node. Can
confirm those failures all pointed back to the initiator, e.g. {noformat}
ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303
StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea]
Streaming error occurred on session with peer *initiator-public-ip*
{noformat}
*Q:* What are your tcp_keepalive settings? (see tuning guide here)
*A:* They are what's recommended: {noformat}$ sudo sysctl -A | grep
net.ipv4.tcp_keep
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 10
{noformat}
*Q:* Also, can you paste full debug.log sample of a node with this error?
*A:* Here is the debug.log file on a remote machine (not the initiator)
debug.log.2016-11-10_2319.gz
was (Author: bing1wu):
[~pauloricardomg] To answer your questions
*Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed
streams is the same throughout the cluster?
*A:* Yes. More details: Not all nodes in the cluster reported SSL failure.
About 7 out of 30 nodes did. I used a combination of "java.net.SocketException:
Connection reset" and the timestamp when the "initiator" (the host that was
running repair) reported failure to search the system.log on every node. Can
confirm those failures all pointed back to the initiator, e.g. {noformat}
ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303
StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea]
Streaming error occurred on session with peer *initiator-public-ip*
{noformat}
*Q:* What are your tcp_keepalive settings? (see tuning guide here)
*A:* They are what's recommended: {noformat}$ sudo sysctl -A | grep
net.ipv4.tcp_keep
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 10
{noformat}
*Q:* Also, can you paste full debug.log sample of a node with this error?
*A:* Here is the debug.log file on a remote machine (not the initiator)
> Streaming failed due to SSL Socket connection reset
> ---------------------------------------------------
>
> Key: CASSANDRA-12886
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12886
> Project: Cassandra
> Issue Type: Bug
> Reporter: Bing Wu
> Attachments: debug.log.2016-11-10_2319.gz
>
>
> While running "nodetool repair", I see many instances of
> "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in
> system.logs on some nodes in the cluster. Timestamps correspond to streaming
> source/initiator's error messages of "sync failed between ..."
> Setup:
> - Cassandra 3.7.01
> - CentOS 6.7 in AWS (multi-region)
> - JDK version: {noformat}
> java version "1.8.0_102"
> Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
> {noformat}
> - cassandra.yaml:
> {noformat}
> server_encryption_options:
> internode_encryption: all
> keystore: [path]
> keystore_password: [password]
> truststore: [path]
> truststore_password: [password]
> # More advanced defaults below:
> # protocol: TLS
> # algorithm: SunX509
> # store_type: JKS
> # cipher_suites:
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
> require_client_auth: false
> {noformat}
> Error messages in system.log on the target host:
> {noformat}
> ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475
> StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e]
> Streaming error occurred on session with peer 54.247.111.232
> javax.net.ssl.SSLException: Connection has been shutdown:
> javax.net.ssl.SSLException: java.net.SocketException: Connection reset
> at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541)
> ~[na:1.8.0_102]
> at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553)
> ~[na:1.8.0_102]
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71)
> ~[na:1.8.0_102]
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> ~[na:1.8.0_102]
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> ~[na:1.8.0_102]
> at
> org.apache.cassandra.io.util.WrappedDataOutputStreamPlus.flush(WrappedDataOutputStreamPlus.java:66)
> ~[apache-cassandra-3.7.0.jar:3.7.0]
> at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:371)
> [apache-cassandra-3.7.0.jar:3.7.0]
> at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342)
> [apache-cassandra-3.7.0.jar:3.7.0]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
> Caused by: javax.net.ssl.SSLException: java.net.SocketException: Connection
> reset
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)