[ 
https://issues.apache.org/jira/browse/CASSANDRA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923619#comment-15923619
 ] 

Simon Zhou commented on CASSANDRA-13323:
----------------------------------------

Thanks [~slebresne] for the comment. For hinted handoff of a dropped table, the 
UnknownColumnFamilyException has been handled in 
HintMessage#Serializer#deserialize. Even though a HintMessage will still be 
returned, its internal data (Hint) is null and thus will be ignored in 
HintVerbHanlder. So UnknownColumnFamilyException just causes some overhead 
(deserialization, etc.) on the receiver side of hinted handoff. At this moment 
I tend to say hinted handoff is unrelated to IncomingTcpConnection being closed 
but I'll double check.

The stack trace I posted in this ticket is actually for paxos commit. 
Unfortunately CommitSerializer doesn't take the message size into 
consideration. So I cannot just catch UnknownColumnFamilyException and skip 
some bytes from DataOutputPlus. To fix that, we will have to update the 
protocol a bit (maybe introduce MessagingService.VERSION_3xx). Do you think it 
worths the effort? I've lost the original logs so I cannot confirm the scope of 
this issue. One of the cons of binary protocol is that it's hard to maintain 
backward compatibility.

> IncomingTcpConnection closed due to one bad message
> ---------------------------------------------------
>
>                 Key: CASSANDRA-13323
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13323
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Simon Zhou
>            Assignee: Simon Zhou
>             Fix For: 3.0.13
>
>         Attachments: CASSANDRA-13323-v1.patch
>
>
> We got this exception:
> {code}
> WARN  [MessagingService-Incoming-/****] 2017-02-14 17:33:33,177 
> IncomingTcpConnection.java:101 - UnknownColumnFamilyException reading from 
> socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId 2a3ab630-df74-11e6-9f81-b56251e1559e. If a table was just created, this 
> is likely due to the schema not being fully propagated.  Please wait for 
> schema agreement on table creation.
>     at 
> org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at 
> org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at 
> org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> {code}
> Also we saw this log in another host indicating it needs to re-connect:
> {code}
> INFO  [HANDSHAKE-/****] 2017-02-21 13:37:50,216 
> OutboundTcpConnection.java:515 - Handshaking version with /****
> {code}
> The reason is that the node was receiving hinted data for a dropped table. 
> This may happen with other messages as well. On Cassandra side, 
> IncomingTcpConnection shouldn't close on just one bad message, even though it 
> will be restarted soon later by SocketThread in MessagingService.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to