[ 
https://issues.apache.org/jira/browse/CASSANDRA-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429722#comment-16429722
 ] 

Dinesh Joshi edited comment on CASSANDRA-14362 at 4/9/18 4:58 PM:
------------------------------------------------------------------

[~jasobrown] / [~aweisberg] I think I know what the issue is. [~jasobrown] not 
only changed the {{OutboundConnectionIdentifier}} to prefer the local address 
of the node but also removed the binding to a specific local IP address during 
{{NettyFactory#createOutboundBootstrap}} setup. This had an unintended 
consequence. When you leave the {{Socket}}'s source IP (local address) unset, 
the kernel will choose one for you. This normally is not an issue. However, 
during the dtest run all nodes are local and rely on 127.0.0.x IPs. When the C* 
nodes try to setup the streaming connections, they get tripped up due to this 
change.

Say node1 (127.0.0.1) & node2 (127.0.0.2) are started up by the dtest (I know 
in reality its 3 nodes but for the sake of simplicity I'm limiting it to two 
nodes). node1 attempts to create an outbound connection for streaming with 
node2, the outbound socket should actually be {{127.0.0.1:ANY_PORT -> 
127.0.0.2:INTERNODE_PORT}}. However, per this patch, we leave the source IP 
unbound. When the connection is established and node1 sends the 
{{FirstHandshakeMessage}}, it has a source IP set to set to an arbitrarily 
chose IP by the kernel - in this case it was {{127.0.0.2}} (when it should 
actually be {{127.0.0.1}}).

The receiving node (node2) gets this message 
{{InboundHandshakeHandler#handleStart}}, decodes it fine but when it goes to 
{{InboundHandshakeHandler#setupStreamingPipeline}}, it creates a 
{{StreamingInboundHandler}} with remote IP set to {{127.0.0.2}}. This messes up 
the streaming pipeline because now node2 is waiting on {{127.0.0.2}} (itself) 
for data. Hence we find it stuck in the bootstrap phase -
{noformat}
"Stream-Deserializer-127.0.0.2:7000-4004ef3b" #133 daemon prio=5 os_prio=35 
tid=0x00007ff58c601ca0 nid=0xdc03 waiting on condition [0x0000700007613000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at java.lang.Thread.sleep(Thread.java:340)
        at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:386)
        at 
com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(Uninterruptibles.java:285)
        at 
org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:174)
        at java.lang.Thread.run(Thread.java:748)

"main" #1 prio=5 os_prio=35 tid=0x00007ff58c706000 nid=0x2803 waiting on 
condition [0x0000700004e5e000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007a11a3f98> (a 
org.apache.cassandra.streaming.StreamResultFuture)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:497)
        at 
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1519)
        at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:949)
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:651)
        - locked <0x00000007a40f80b8> (a 
org.apache.cassandra.service.StorageService)
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:586)
        - locked <0x00000007a40f80b8> (a 
org.apache.cassandra.service.StorageService)
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:367)
        at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:590)
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:679)
{noformat}
I also noted this error in the logs which further points to the wrong IP for 
the peer being used -
{noformat}
ERROR [Stream-Deserializer-127.0.0.2:53830-7b781083] 2018-04-06 13:19:25,863 
StreamingInboundHandler.java:210 - [Stream channel: 7b781083] stream operation 
from 127.0.0.2:53830 failed
java.lang.IllegalArgumentException: Unknown peer requested: 127.0.0.2:7000
        at 
org.apache.cassandra.streaming.StreamCoordinator.getHostData(StreamCoordinator.java:242)
        at 
org.apache.cassandra.streaming.StreamCoordinator.getSessionById(StreamCoordinator.java:170)
        at 
org.apache.cassandra.streaming.StreamResultFuture.getSession(StreamResultFuture.java:237)
        at 
org.apache.cassandra.streaming.StreamManager.findSession(StreamManager.java:194)
        at 
org.apache.cassandra.streaming.StreamManager.findSession(StreamManager.java:185)
        at 
org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:41)
        at 
org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
        at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:55)
        at 
org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:177)
        at java.lang.Thread.run(Thread.java:748)
{noformat}
I added a bit of debugging code and turned {{TRACE}} logging on and confirmed 
that this was indeed the case.

Below is the output of node2's debug.log in the 3 node cluster that dtest run 
created. node1, 2 & 3 were assigned IPs {{127.0.0.1, 127.0.0.2, 127.0.0.3}} 
respectively. Note that node2's actual IP is never used as we've left it 
unbound so it seems the OS kernel prefers the IP that it thinks is closest to 
the remote IP so channel local & remote as the showing up as the same IP.
{noformat}
03:07 $ tail -F 
/var/folders/5f/_29kl4f524l7gp__02cxby_c0000gn/T/dtest-wu9pai7a/test/node2/logs/debug.log
  | grep handshake
TRACE [MessagingService-NettyOutbound-Thread-4-1] 2018-04-08 03:07:55,360 
OutboundHandshakeHandler.java:107 - starting handshake with peer 
127.0.0.1:7000, msg = FirstHandshakeMessage - messaging version: 12, mode: 
MESSAGING, compress: false, channel local = /127.0.0.1:65020 remote = 
/127.0.0.1:7000
TRACE [MessagingService-NettyOutbound-Thread-4-3] 2018-04-08 03:09:21,427 
OutboundHandshakeHandler.java:107 - starting handshake with peer 
127.0.0.3:7000, msg = FirstHandshakeMessage - messaging version: 12, mode: 
MESSAGING, compress: false, channel local = /127.0.0.3:65091 remote = 
/127.0.0.3:7000
{noformat}
I only modified the trace log line in 
{{OutboundHandshakeHandler#channelActive}} as below -
{noformat}
        logger.trace("starting handshake with peer {}, msg = {}, channel local 
= {} remote = {}", connectionId.connectionAddress(),
                     msg, ctx.channel().localAddress(), 
ctx.channel().remoteAddress());
{noformat}

What is unclear to me is why is this an issue now if CASSANDRA-12673 left the 
local side unbound?


was (Author: djoshi3):
[~jasobrown] / [~aweisberg] I think I know what the issue is. [~jasobrown] not 
only changed the {{OutboundConnectionIdentifier}} to prefer the local address 
of the node but also removed the binding to a specific local IP address during 
{{NettyFactory#createOutboundBootstrap}} setup. This had an unintended 
consequence. When you leave the {{Socket}}'s source IP (local address) unset, 
the kernel will choose one for you. This normally is not an issue. However, 
during the dtest run all nodes are local and rely on 127.0.0.x IPs. When the C* 
nodes try to setup the streaming connections, they get tripped up due to this 
change.

Say node1 (127.0.0.1) & node2 (127.0.0.2) are started up by the dtest (I know 
in reality its 3 nodes but for the sake of simplicity I'm limiting it to two 
nodes). node1 attempts to create an outbound connection for streaming with 
node2, the outbound socket should actually be {{127.0.0.1:ANY_PORT -> 
127.0.0.2:INTERNODE_PORT}}. However, per this patch, we leave the source IP 
unbound. When the connection is established and node1 sends the 
{{FirstHandshakeMessage}}, it has a source IP set to set to an arbitrarily 
chose IP by the kernel - in this case it was {{127.0.0.2}} (when it should 
actually be {{127.0.0.1}}).

The receiving node (node2) gets this message 
{{InboundHandshakeHandler#handleStart}}, decodes it fine but when it goes to 
{{InboundHandshakeHandler#setupStreamingPipeline}}, it creates a 
{{StreamingInboundHandler}} with remote IP set to {{127.0.0.2}}. This messes up 
the streaming pipeline because now node2 is waiting on {{127.0.0.2}} (itself) 
for data. Hence we find it stuck in the bootstrap phase -
{noformat}
"Stream-Deserializer-127.0.0.2:7000-4004ef3b" #133 daemon prio=5 os_prio=35 
tid=0x00007ff58c601ca0 nid=0xdc03 waiting on condition [0x0000700007613000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at java.lang.Thread.sleep(Thread.java:340)
        at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:386)
        at 
com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(Uninterruptibles.java:285)
        at 
org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:174)
        at java.lang.Thread.run(Thread.java:748)

"main" #1 prio=5 os_prio=35 tid=0x00007ff58c706000 nid=0x2803 waiting on 
condition [0x0000700004e5e000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007a11a3f98> (a 
org.apache.cassandra.streaming.StreamResultFuture)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:497)
        at 
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1519)
        at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:949)
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:651)
        - locked <0x00000007a40f80b8> (a 
org.apache.cassandra.service.StorageService)
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:586)
        - locked <0x00000007a40f80b8> (a 
org.apache.cassandra.service.StorageService)
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:367)
        at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:590)
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:679)
{noformat}
I also noted this error in the logs which further points to the wrong IP for 
the peer being used -
{noformat}
ERROR [Stream-Deserializer-127.0.0.2:53830-7b781083] 2018-04-06 13:19:25,863 
StreamingInboundHandler.java:210 - [Stream channel: 7b781083] stream operation 
from 127.0.0.2:53830 failed
java.lang.IllegalArgumentException: Unknown peer requested: 127.0.0.2:7000
        at 
org.apache.cassandra.streaming.StreamCoordinator.getHostData(StreamCoordinator.java:242)
        at 
org.apache.cassandra.streaming.StreamCoordinator.getSessionById(StreamCoordinator.java:170)
        at 
org.apache.cassandra.streaming.StreamResultFuture.getSession(StreamResultFuture.java:237)
        at 
org.apache.cassandra.streaming.StreamManager.findSession(StreamManager.java:194)
        at 
org.apache.cassandra.streaming.StreamManager.findSession(StreamManager.java:185)
        at 
org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:41)
        at 
org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
        at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:55)
        at 
org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:177)
        at java.lang.Thread.run(Thread.java:748)
{noformat}
I added a bit of debugging code and turned {{TRACE}} logging on and confirmed 
that this was indeed the case.

Below is the output of node2's debug.log in the 3 node cluster that dtest run 
created. node1, 2 & 3 were assigned IPs {{127.0.0.1, 127.0.0.2, 127.0.0.3}} 
respectively. Note that node2's actual IP is never used as we've left it 
unbound so it seems the OS kernel prefers the IP that it thinks is closest to 
the remote IP so channel local & remote as the showing up as the same IP.
{noformat}
03:07 $ tail -F 
/var/folders/5f/_29kl4f524l7gp__02cxby_c0000gn/T/dtest-wu9pai7a/test/node2/logs/debug.log
  | grep handshake
TRACE [MessagingService-NettyOutbound-Thread-4-1] 2018-04-08 03:07:55,360 
OutboundHandshakeHandler.java:107 - starting handshake with peer 
127.0.0.1:7000, msg = FirstHandshakeMessage - messaging version: 12, mode: 
MESSAGING, compress: false, channel local = /127.0.0.1:65020 remote = 
/127.0.0.1:7000
TRACE [MessagingService-NettyOutbound-Thread-4-3] 2018-04-08 03:09:21,427 
OutboundHandshakeHandler.java:107 - starting handshake with peer 
127.0.0.3:7000, msg = FirstHandshakeMessage - messaging version: 12, mode: 
MESSAGING, compress: false, channel local = /127.0.0.3:65091 remote = 
/127.0.0.3:7000
{noformat}
I only modified the trace log line in 
{{OutboundHandshakeHandler#channelActive}} as below -
{noformat}
        logger.trace("starting handshake with peer {}, msg = {}, channel local 
= {} remote = {}", connectionId.connectionAddress(),
                     msg, ctx.channel().localAddress(), 
ctx.channel().remoteAddress());
{noformat}

> Node unable to join cluster in trunk
> ------------------------------------
>
>                 Key: CASSANDRA-14362
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14362
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Lerh Chuan Low
>            Assignee: Jason Brown
>            Priority: Major
>             Fix For: 4.0
>
>
> I'm not sure if anybody has tried using {{trunk}} and starting an actual EC2 
> cluster, but with a 3 node setup that works on 3.11, it fails to work on 
> {{trunk}}. I mistakenly stumbled into it while testing my own code changes. 
> My setup is as follows:
> broadcast_rpc_address: publicIP
> broadcast_address: publicIP
> listen_address: omitted. Ends up as privateIP. 
> Works on 3.11 just fine. 
> On {{trunk}} though, it never works. My node is never able to join the 
> cluster:
> {code}
> Apr 03 05:57:06 ip-10-0-47-122 cassandra[13914]: INFO  [main] 2018-04-03 
> 05:57:05,895 RangeStreamer.java:195 - Bootstrap: range 
> (-128373781239966537,-122439194129870521] exists on 52.88.241.181:7000 for 
> keyspace system_traces
> Apr 03 05:57:06 ip-10-0-47-122 cassandra[13914]: INFO  [main] 2018-04-03 
> 05:57:05,895 RangeStreamer.java:195 - Bootstrap: range 
> (6968670424536541270,6973888347502882935] exists on 52.88.241.181:7000 for 
> keyspace system_traces
> Apr 03 05:57:42 ip-10-0-47-122 cassandra[13914]: WARN  [GossipTasks:1] 
> 2018-04-03 05:57:42,298 FailureDetector.java:324 - Not marking nodes down due 
> to local pause of 26215173446ns > 5000000000ns
> Apr 03 05:57:53 ip-10-0-47-122 cassandra[13914]: WARN  [GossipTasks:1] 
> 2018-04-03 05:57:53,035 FailureDetector.java:324 - Not marking nodes down due 
> to local pause of 10736485907ns > 5000000000ns
> Apr 03 05:58:30 ip-10-0-47-122 cassandra[13914]: WARN  [GossipTasks:1] 
> 2018-04-03 05:58:30,790 Gossiper.java:814 - Gossip stage has 28 pending 
> tasks; skipping status check (no nodes will be marked down)
> Apr 03 05:58:33 ip-10-0-47-122 cassandra[13914]: WARN  [GossipTasks:1] 
> 2018-04-03 05:58:33,060 Gossiper.java:814 - Gossip stage has 20 pending 
> tasks; skipping status check (no nodes will be marked down)
> Apr 03 06:04:33 ip-10-0-47-122 cassandra[13914]: WARN  [GossipTasks:1] 
> 2018-04-03 06:04:33,826 FailureDetector.java:324 - Not marking nodes down due 
> to local pause of 400790432954ns > 5000000000ns
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: WARN  [GossipTasks:1] 
> 2018-04-03 06:04:49,133 Gossiper.java:814 - Gossip stage has 2 pending tasks; 
> skipping status check (no nodes will be marked down)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: ERROR 
> [StreamConnectionEstablisher:1] 2018-04-03 06:04:49,138 
> StreamSession.java:524 - [Stream #d4cd6420-3703-11e8-a6a5-e51ddc10cfe6] 
> Streaming error occurred on session with peer 52.88.241.181:7000
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: java.io.IOException: failed 
> to connect to 52.88.241.181:7000 (STREAM) for streaming data
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> org.apache.cassandra.streaming.DefaultConnectionFactory.createConnection(DefaultConnectionFactory.java:98)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> org.apache.cassandra.streaming.DefaultConnectionFactory.createConnection(DefaultConnectionFactory.java:57)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> org.apache.cassandra.streaming.async.NettyStreamingMessageSender.createChannel(NettyStreamingMessageSender.java:183)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> org.apache.cassandra.streaming.async.NettyStreamingMessageSender.setupControlMessageChannel(NettyStreamingMessageSender.java:165)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> org.apache.cassandra.streaming.async.NettyStreamingMessageSender.sendMessage(NettyStreamingMessageSender.java:222)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> org.apache.cassandra.streaming.async.NettyStreamingMessageSender.initialize(NettyStreamingMessageSender.java:146)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:271)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:273)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> java.lang.Thread.run(Thread.java:748)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: Caused by: 
> io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Cannot 
> assign requested address
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.channel.unix.Errors.newIOException(Errors.java:117)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.channel.unix.Socket.bind(Socket.java:266)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.channel.epoll.AbstractEpollStreamChannel.doConnect(AbstractEpollStreamChannel.java:729)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.channel.epoll.EpollSocketChannel.doConnect(EpollSocketChannel.java:184)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.connect(AbstractEpollStreamChannel.java:797)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1274)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:545)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:530)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:999)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.channel.AbstractChannel.connect(AbstractChannel.java:260)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.bootstrap.Bootstrap$3.run(Bootstrap.java:254)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:312)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         ... 1 common frames 
> omitted
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: INFO  
> [StreamConnectionEstablisher:1] 2018-04-03 06:04:49,140 
> StreamResultFuture.java:197 - [Stream #d4cd6420-3703-11e8-a6a5-e51ddc10cfe6] 
> Session with 52.88.241.181:7000 is complete
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: ERROR 
> [StreamConnectionEstablisher:1] 2018-04-03 06:04:49,146 
> StreamSession.java:524 - [Stream #d4cd6420-3703-11e8-a6a5-e51ddc10cfe6] 
> Streaming error occurred on session with peer 52.88.241.181:7000
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: java.lang.RuntimeException: 
> stream has been closed, cannot send Prepare SYN (3 requests,  0 files}
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> org.apache.cassandra.streaming.async.NettyStreamingMessageSender.sendMessage(NettyStreamingMessageSender.java:209)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> org.apache.cassandra.streaming.StreamSession.onInitializationComplete(StreamSession.java:495)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:272)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:273)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at 
> java.lang.Thread.run(Thread.java:748)
> {code}
> Remote debugging it, it eventually seems to come to be due to trying to call 
> {{socket.bind}} in {{doConnect}} in {{AbstractEpollStreamChannel}}. Local 
> address is <publicIP>:0. I think port 0 is fine because it just means pick 
> the next ephemeral port available, but there's no way of binding to the 
> public IP. 
> I made a commit 
> https://github.com/apache/cassandra/commit/00222ddf0694f1c35c2bdd84fd7407174c3fc57a
>  that seems to have fixed it so I can continue on with my purposes, and that 
> is have {{StreamSession}} use listen address instead of broadcast address 
> which may be public. In the event listen address is public I guess that means 
> if it was able to bind to that interface to begin with then {{socket.bind}} 
> should also work. 
> Not sure if this is the right way though, and it seems to have been 
> introduced by the Streaming rework to netty. 
> https://issues.apache.org/jira/browse/CASSANDRA-12229. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to