Hi Tommy,

Thank you for taking the time to start kicking the tires on the upgrade to
4.0. It looks like you've found two bugs:

1) "Unknown column coordinator_port during deserialization" (reported on
3.x nodes)

- looks like the 4.0 node isn't filtering out a column from a system table
that 3.0 doesn't know about. Most likely due to CASSANDRA-7544. Can you
open a JIRA for this, and tag @aweisberg?

2) SSL connection problems

I unserstand the 4.0 -> 3.X connection problem, and documented it at [1] in
MessagingService. TL;DR we don't know the version of a peer when restarting
and need to wait for that peer to connect to the local node and pass it's
correct messaging version (if the local node cannot interpret the handhsake
from the peer). However, why for the inbound connection to the 4.0 node it
is seeing SSLv2 is unclear. Can you open a separate JIRA, and we'll go from
there? In the meantime, maybe enable the JDK's SSL debugging [2] on the 3.x
node to see exactly what it is trying to do? Also, you can enable wire
tracing on the 4.0 node by setting this value to true [3] and recompiling.
We can followup further in the jira.

Thanks!

-Jason

[1]
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L1668
[2]
https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/ReadDebug.html
[3]
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/async/NettyFactory.java#L84

On Tue, Oct 23, 2018 at 2:44 AM Tommy Stendahl <tommy.stend...@ericsson.com>
wrote:

> Hi,
>
> I have been testing upgrade to 4.0, I started out with a cluster with
> 3.0.15 and server encryption enabled. Due to some issues in my
> environment I did upgrade one of the nodes to 3.11.3, I think this
> turned out to be a good thing since I could observer the behaviour of
> upgrading from both 3.0.15 and 3.11.3 at the same time.
>
> At first I didn't have much success at all, it look like found multiple
> issues mostly with server encryption so I decided to simplify thing and
> disabled server encryption.
>
> So with server encryption disabled the upgrade was working ok, what I
> did notice was exceptions in the 3.0.15 and 3.11.3 nodes once the first
> 4.0 node started.
>
> 3.0.15 exception:
> 2018-10-22T11:05:38.883+0200 ERROR
> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:223
> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244
> ,5,main]
> java.lang.RuntimeException: Unknown column coordinator_port during
> deserialization
>          at
> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433)
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at
> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:447)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:647)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:584)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at
> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98)
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>          at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>
> ~[apache-cassandra-3.0.15.jar:3.0.15]
>
> 3.11.3 exception:
> 2018-10-22T11:12:05.060+0200 ERROR
> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:228
> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244
> ,5,main]
> java.lang.RuntimeException: Unknown column coordinator_port during
> deserialization
>          at
> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:452)
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at
> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:482)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:760)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:697)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at
> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123)
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>          at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
>
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>
> I got one or two of these exceptions every minute, I could not see that
> this was causing any problems but I don't really like exceptions in the
> log. Should we expect exceptions like this during an upgrade or is this
> a problem we should look in to?
>
> So now I enabled server encryption again, with this configuration:
> storage_port: 12700
> ssl_storage_port: 12701
> server_encryption_options:
>      # set to true for allowing secure incoming connections
>      enabled: true
>      # If enabled and optional are both set to true, encrypted and
> unencrypted connections are handled on the storage_port
>      optional: false
>      # if enabled, will open up an encrypted listening socket on
> ssl_storage_port. Should be used
>      # during upgrade to 4.0; otherwise, set to false.
>      enable_legacy_ssl_storage_port: true
>      # on outbound connections, determine which type of peers to
> securely connect to. 'enabled' must be set to true.
>      internode_encryption: all
>      keystore: /usr/share/cassandra/.ssl/cil-intern/server/keystore.jks
>      keystore_password: '*********'
>      truststore: /usr/share/cassandra/.ssl/cil-intern/server/truststore.jks
>      truststore_password: '**********'
>      # More advanced defaults below:
>      protocol: TLSv1.2
>      # store_type: JKS
>      cipher_suites:
>
> [TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA]
>      # require_client_auth: false
>      # require_endpoint_verification: false
>
> As far as I understand this configuration is correct but the 4.0 node
> can't contact the old nodes. I have tried with the default protocol
> setting also but it makes no difference. The behaviour is different in
> 3.0.15 and 3.11.3, in 3.0.15 I get this exception:
> 2018-10-22T11:57:44.366+0200 ERROR
> [MessagingService-NettyInbound-Thread-3-8]
> InboundHandshakeHandler.java:300 Failed to properly handshake with peer
> /10.216.193.246:58296. Closing the channel.
> io.netty.handler.codec.DecoderException:
> javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
>          at
>
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459)
>          at
>
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
>          at
>
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>          at
>
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>          at
>
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>          at
>
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
>          at
>
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>          at
>
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>          at
>
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
>          at
>
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:808)
>          at
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:417)
>          at
> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:317)
>          at
>
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
>          at
>
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>          at java.lang.Thread.run(Thread.java:748)
> Caused by: javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
>          at
> sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:637)
>          at sun.security.ssl.InputRecord.read(InputRecord.java:527)
>          at
> sun.security.ssl.EngineInputRecord.read(EngineInputRecord.java:382)
>          at
> sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:962)
>          at
> sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907)
>          at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
>          at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
>          at
> io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:294)
>          at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1275)
>          at
> io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177)
>          at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221)
>          at
>
> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
>          at
>
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
>          ... 14 common frames omitted
>
> and in the 4.0 debug log I see this related to the 3.0.15 node:
> 2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1]
> OutboundMessagingConnection.java:259 connection attempt 788 to
> 10.216.193.246:12700 (GOSSIP)
> 2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1]
> NettyFactory.java:321 creating outbound bootstrap to peer
> 10.216.193.246:12700, compression: true, encryption: enabled (jdk),
> coalesce: DISABLED, protocolV
> ersion: 12
> 2018-10-22T13:25:34.526+0200 DEBUG [GossipTasks:1]
> OutboundMessagingConnection.java:259 connection attempt 789 to
> 10.216.193.246:12700 (GOSSIP)
> 2018-10-22T13:25:34.527+0200 DEBUG [GossipTasks:1] NettyFactory.java:321
> creating outbound bootstrap to peer 10.216.193.246:12700, compression:
> true, encryption: enabled (jdk), coalesce: DISABLED, protocolVersion:
>   12
>
> In the 3.11.3 node I don't see any errors, I do see this:
> 2018-10-22T13:20:36.270+0200  INFO [HANDSHAKE-/10.216.193.244]
> OutboundTcpConnection.java:561 Handshaking version with /10.216.193.244
>
> and in the 4.0 node I find this:
> 2018-10-22T13:20:36.270+0200  INFO
> [MessagingService-NettyInbound-Thread-3-1]
> InboundHandshakeHandler.java:146 connection from peer
> /10.216.193.243:54262, protocol = TLSv1.2, cipher suite =
> TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
> 2018-10-22T13:20:41.147+0200 DEBUG [main]
> OutboundMessagingConnection.java:259 connection attempt 0 to
> 10.216.193.243:12700 (GOSSIP)
> 2018-10-22T13:20:41.147+0200 DEBUG [main] NettyFactory.java:321 creating
> outbound bootstrap to peer 10.216.193.243:12700, compression: true,
> encryption: enabled (jdk), coalesce: DISABLED, protocolVersion: 11
> 2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1]
> OutboundMessagingConnection.java:259 connection attempt 1 to
> 10.216.193.243:12700 (GOSSIP)
> 2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1]
> NettyFactory.java:321 creating outbound bootstrap to peer
> 10.216.193.243:12700, compression: true, encryption: enabled (jdk),
> coalesce: DISABLED, protocolV
> ersion: 11
>
> It looks like the 4.0 node doesn’t accept incoming messages from the old
> nodes on the ssl_storage_port and continues to try to contact the old
> nodes on the storage_port that are disabled in the old nodes. I don't
> really know what to make of this, any ideas or suggestions?
>
> Regards,
> Tommy
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Reply via email to