[ 
https://issues.apache.org/jira/browse/CASSANDRA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907017#comment-15907017
 ] 

Sylvain Lebresne commented on CASSANDRA-13323:
----------------------------------------------

Pretty sure this patch is not going to work. When you get the 
{{UnknownColumnFamilyException}}, only a sub-part of the message has been 
deserialized, so trying to deserialize further message on that connection is 
going to get (what looks like) garbage. This is, in fact, why we currently just 
throw out the connection, it's the simplest safest thing to do.

This doesn't mean btw that we couldn't have way to resume on failed message (at 
lest when we know the failure is not due to a corrupted stream like in this 
particular case), but it's a bit more involved. The simplest somewhat-generic 
solution I see fwiv would be to wrap the DataInput into one that counts how 
many bytes are deserialized. We'd reset the counter at the beginning of each 
payload and on an exception, we'd know how many bytes we have to skip to resume 
reading to the next message properly.

> IncomingTcpConnection closed due to one bad message
> ---------------------------------------------------
>
>                 Key: CASSANDRA-13323
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13323
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Simon Zhou
>            Assignee: Simon Zhou
>             Fix For: 3.0.13
>
>         Attachments: CASSANDRA-13323-v1.patch
>
>
> We got this exception:
> {code}
> WARN  [MessagingService-Incoming-/****] 2017-02-14 17:33:33,177 
> IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from 
> socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this 
> is likely due to the schema not being fully propagated.  Please wait for 
> schema agreement on table creation.
>     at 
> org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at 
> org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at 
> org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> {code}
> Also we saw this log in another host indicating it needs to re-connect:
> {code}
> INFO  [HANDSHAKE-/****] 2017-02-21 13:37:50,216 
> OutboundTcpConnection.java:515 - Handshaking version with /****
> {code}
> The reason is that the node was receiving hinted data for a dropped table. 
> This may happen with other messages as well. On Cassandra side, 
> IncomingTcpConnection shouldn't close on just one bad message, even though it 
> will be restarted soon later by SocketThread in MessagingService.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to