[ 
https://issues.apache.org/jira/browse/CASSANDRA-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318159#comment-17318159
 ] 

David Capwell commented on CASSANDRA-16581:
-------------------------------------------

Spoke with Sam and we need to review a few things first:

1) ProtocolException doesn't distinguish between a simple user error (bad CL), 
and corrupt message (the test added has frame version v84... which doesn't 
exist).  Reconnects are not free, so this lack of clarity on the exception can 
be problematic when closing the socket; this problem exists cross 3.x and 4.x 
lines
2) v5 protocol can have multiple streams on the same frame, and client might 
not be able to map stream to frame, so a frame level issue (such as 
checkpointing) can become a problem; this problem is localized to 4.x

> Failure to execute queries should emit a KPI other than read 
> timeout/unavailable so it can be alerted/tracked
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16581
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16581
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Client, Observability/Metrics
>            Reporter: David Capwell
>            Assignee: David Capwell
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0-rc
>
>
> When we are unable to parse a message we do not have a way to detect this 
> from a monitoring point of view so can get into situations where we believe 
> the database is fine but the clients are on-fire.  This case popped up in the 
> 2.1 to 3.0 upgrade as paging state wasn’t mixed-mode safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to