Hi, We are running Cassandra 2.1.14 on an IBM AIX cluster using IBM Java 7 (1.7.1.64). I am having problems adding new nodes to the cluster. I am seeing the following exception. It appears like the new node is getting stuck trying to send the magic number on the first streaming socket...whilst the receiving node never receives it and times out after 10 seconds.
New Node: INFO [StreamConnectionEstablisher:1] 2017-04-28 17:39:20,196 StreamSession.java:220 - [Stream #22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Starting streaming to /1.2.3.4 INFO [StreamConnectionEstablisher:2] 2017-04-28 17:39:20,197 StreamSession.java:220 - [Stream #22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Starting streaming to /5.6.7.8 INFO [StreamConnectionEstablisher:1] 2017-04-28 17:39:20,209 StreamCoordinator.java:209 - [Stream #22c10290-2c5b-11e7-a33c-8f9ab3a4bd92, ID#0] Beginning stream session with /1.2.3.4 INFO [STREAM-IN-/1.2.3.4] 2017-04-28 17:39:20,276 StreamResultFuture.java:166 - [Stream #22c10290-2c5b-11e7-a33c-8f9ab3a4bd92 ID#0] Prepare completed. Receiving 2 files(43103 bytes), sending 0 files(0 bytes) INFO [StreamReceiveTask:2] 2017-04-28 17:39:20,410 StreamResultFuture.java:180 - [Stream #22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Session with /1.2.3.4 is complete ERROR [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,207 StreamSession.java:505 - [Stream #22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Streaming error occurred java.nio.channels.AsynchronousCloseException: null at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:224) ~[na:1.7.0] at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:538) ~[na:1.7.0] at org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.ConnectionHandler$MessageHandler.sendInitMessage(ConnectionHandler.java:191) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:81) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:223) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:208) [apache-cassandra-2.1.14.jar:2.1.14] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157) [na:1.7.0] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627) [na:1.7.0] at java.lang.Thread.run(Thread.java:809) [na:1.7.0] INFO [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,208 StreamResultFuture.java:180 - [Stream #22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Session with /5.6.7.8 is complete WARN [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,211 StreamResultFuture.java:207 - [Stream #22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Stream failed INFO [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,212 StreamCoordinator.java:209 - [Stream #22c10290-2c5b-11e7-a33c-8f9ab3a4bd92, ID#0] Beginning stream session with /5.6.7.8 ERROR [main] 2017-04-28 17:39:30,213 CassandraDaemon.java:581 - Exception encountered during startup java.lang.RuntimeException: Error during boostrap: Stream failed at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:86) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1166) ~[apache-cassandra-2.1.14.jar:2.1.14] Existing node: DEBUG [ACCEPT-/5.6.7.8] 2017-04-28 17:39:29,914 MessagingService.java:1014 - Error reading the socket Socket[addr=/9.0.1.2,port=55848,localport=7000] java.net.SocketTimeoutException: null at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:242) ~[na:1.7.0] at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:116) ~[na:1.7.0] at java.io.DataInputStream.readFully(DataInputStream.java:207) ~[na:1.7.0] at java.io.DataInputStream.readInt(DataInputStream.java:399) ~[na:1.7.0] at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:988) ~[apache-cassandra-2.1.14.jar:2.1.14] TRACE [MessagingService-Incoming-/9.0.1.2] 2017-04-28 17:39:29,989 IncomingTcpConnection.java:92 - eof reading from socket; closing java.io.EOFException: null at java.io.DataInputStream.readFully(DataInputStream.java:209) ~[na:1.7.0] at java.io.DataInputStream.readInt(DataInputStream.java:399) ~[na:1.7.0] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:171) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88) ~[apache-cassandra-2.1.14.jar:2.1.14] TRACE [MessagingService-Incoming-/9.0.1.2] 2017-04-28 17:39:29,990 IncomingTcpConnection.java:115 - Closing socket Socket[addr=/9.0.1.2,port=55840,localport=7000] - isclosed: false TRACE [MessagingService-Incoming-/9.0.1.2] 2017-04-28 17:39:29,991 IncomingTcpConnection.java:92 - eof reading from socket; closing java.io.EOFException: null at java.io.DataInputStream.readFully(DataInputStream.java:209) ~[na:1.7.0] at java.io.DataInputStream.readInt(DataInputStream.java:399) ~[na:1.7.0] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:171) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88) ~[apache-cassandra-2.1.14.jar:2.1.14] Everything works fine bringing up the new node until it gets up to streaming. Did a wireshark and nothing is sent on the streaming socket. Put the existing cluster nodes in trace and didn't see anything very exciting (apart from the error - need to add trace logs to the client). Other java processes are networking on the same system without problem. Resource limit values appear to be set correctly. We played around with zero data in the cluster and bootstrapped with full data...with zero data we were able to create a cluster of three nodes (though if we started with the "wrong node" we couldn't create a cluster of size greater than one) and with full data (9GB) we were able to create a cluster with only two nodes. The time on the nodes may be off by up to a second - would that be big enough to cause any trouble when bootstrapping? Anyone seen something like this before? I haven't found anything so far in bugs, google searches, mailing lists that match this behaviour (though I could have missed something). Of course this could be an AIX/IBM Java specific issue (as I know the recommendation is to use Oracle JVM and AIX is not a Cassandra standard configuration)... Any suggestions would be appreciated. thanks in advance, Gareth --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org