[
https://issues.apache.org/jira/browse/CASSGO-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040800#comment-18040800
]
Bohdan Siryk commented on CASSGO-97:
------------------------------------
{quote}Since a protocol error means something went wrong related to the
protocol itself it's possible that the server doesn't even know which stream id
to use in the response (maybe it wasn't even able to decode the frames
properly). In these scenarios the server can just return stream id 0 as a
default, a protocol error should always lead to closing the connection anyway
so regardless of what stream id is used the end result should be the
cancellation (or retry on the next connection) of all pending requests
{quote}
That makes sense, thanks for the explanation. So it was referring to a protocol
error opcode, which could be any protocol-related error.
>From my testing, the issue is caused by bad error wrapping
>[here|https://github.com/apache/cassandra-gocql-driver/blob/22ab88e75597baf630dc553439039ca1f2ad3bfc/conn.go#L413],
> and the fact that we no longer set the beta-flag for proto v5, which is
>required by C* 3.11. And this is an actual reason why on C* 4+ we can't
>reproduce it.
I see 2 possible solutions here:
# return not a %T type returned, but its value, so we get a message, at least
we can parse later.
# trying every protocol version from the latest to the oldest until we find
one that is supported.
I like the second option more. We can modify
[discoverProtocol|https://github.com/apache/cassandra-gocql-driver/blob/22ab88e75597baf630dc553439039ca1f2ad3bfc/control.go#L261]
to iterate over all versions supported by gocql and try to dial with each of
them until one succeeds:
{code:java}
func (c *controlConn) discoverProtocol(hosts []*HostInfo) (int, error) {
hosts = shuffleHosts(hosts)
handler := connErrorHandlerFn(func(c *Conn, err error, closed bool) {
// we should never get here, but if we do it means we connected to a
// host successfully which means our attempted protocol version worked
if !closed {
c.Close()
}
})
var err error
for _, host := range hosts {
connCfg := *c.session.connCfg
for proto := highestProtocolVersionSupported; proto >=
lowestProtocolVersionSupported; proto-- {
connCfg.ProtoVersion = proto
var conn *Conn
conn, err = c.session.dial(c.session.ctx, host, &connCfg, handler)
if conn != nil {
conn.Close()
}
if err == nil {
c.session.logger.Debug("Discovered protocol version using host.",
NewLogFieldInt("protocol_version", connCfg.ProtoVersion),
NewLogFieldIP("host_addr", host.ConnectAddress()), NewLogFieldString("host_id",
host.HostID()))
return connCfg.ProtoVersion, nil
}
c.session.logger.Debug("Failed to discover protocol version using
host.",
NewLogFieldIP("host_addr", host.ConnectAddress()),
NewLogFieldString("host_id", host.HostID()), NewLogFieldError("err", err))
}
}
return 0, err
} {code}
And this will actually eliminate the issue this ticket relates to. The original
issue is caused by a missing beta-flag for protocol v5 with C* 3.11 and bad
error wrapping, which also will be resolved if we just try each supported
protocol version. Ofc it will slow down a bit the whole protocol negotiation
process, but the tradeoff seems reasonable to me since it only happens during
session initialisation.
But you still won't be able to connect to C* 3.11 with proto v5, because there
is no public api to set a beta flag
> Protocol version negotiation doesn't work if server replies with stream id
> different than 0
> -------------------------------------------------------------------------------------------
>
> Key: CASSGO-97
> URL: https://issues.apache.org/jira/browse/CASSGO-97
> Project: Apache Cassandra Go driver
> Issue Type: Bug
> Components: Core
> Reporter: João Reis
> Priority: Normal
> Fix For: 2.x
>
>
> If the server's ProtocolError response comes with stream id 0 then [this
> code|https://github.com/apache/cassandra-gocql-driver/blob/0326fae3617dd19b901f2e9a97479c04fc11e05a/conn.go#L685-L700]
> will create the protocol error object.
> If the response comes with a positive stream id then [this
> code|https://github.com/apache/cassandra-gocql-driver/blob/0326fae3617dd19b901f2e9a97479c04fc11e05a/conn.go#L1314-L1330]
> will create the protocol error object. This latter way of creating the error
> makes [the regex check not
> work|https://github.com/apache/cassandra-gocql-driver/blob/0326fae3617dd19b901f2e9a97479c04fc11e05a/control.go#L210-L245].
> This was found when trying to connect to
> [ZDM-Proxy|https://github.com/datastax/zdm-proxy/] but connecting to a C*
> 3.11.x cluster works fine.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]