[ 
https://issues.apache.org/jira/browse/CASSGO-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040800#comment-18040800
 ] 

Bohdan Siryk commented on CASSGO-97:
------------------------------------

{quote}Since a protocol error means something went wrong related to the 
protocol itself it's possible that the server doesn't even know which stream id 
to use in the response (maybe it wasn't even able to decode the frames 
properly). In these scenarios the server can just return stream id 0 as a 
default, a protocol error should always lead to closing the connection anyway 
so regardless of what stream id is used the end result should be the 
cancellation (or retry on the next connection) of all pending requests
{quote}
That makes sense, thanks for the explanation. So it was referring to a protocol 
error opcode, which could be any protocol-related error.

>From my testing, the issue is caused by bad error wrapping 
>[here|https://github.com/apache/cassandra-gocql-driver/blob/22ab88e75597baf630dc553439039ca1f2ad3bfc/conn.go#L413],
> and the fact that we no longer set the beta-flag for proto v5, which is 
>required by C* 3.11. And this is an actual reason why on C* 4+ we can't 
>reproduce it.

I see 2 possible solutions here:
 # return not a %T type returned, but its value, so we get a message, at least 
we can parse later.
 # trying every protocol version from the latest to the oldest until we find 
one that is supported.

I like the second option more. We can modify 
[discoverProtocol|https://github.com/apache/cassandra-gocql-driver/blob/22ab88e75597baf630dc553439039ca1f2ad3bfc/control.go#L261]
 to iterate over all versions supported by gocql and try to dial with each of 
them until one succeeds:
{code:java}
func (c *controlConn) discoverProtocol(hosts []*HostInfo) (int, error) {
    hosts = shuffleHosts(hosts)

    handler := connErrorHandlerFn(func(c *Conn, err error, closed bool) {
       // we should never get here, but if we do it means we connected to a
       // host successfully which means our attempted protocol version worked
       if !closed {
          c.Close()
       }
    })

    var err error
    for _, host := range hosts {
       connCfg := *c.session.connCfg
       for proto := highestProtocolVersionSupported; proto >= 
lowestProtocolVersionSupported; proto-- {
          connCfg.ProtoVersion = proto

          var conn *Conn
          conn, err = c.session.dial(c.session.ctx, host, &connCfg, handler)
          if conn != nil {
             conn.Close()
          }

          if err == nil {
             c.session.logger.Debug("Discovered protocol version using host.",
                NewLogFieldInt("protocol_version", connCfg.ProtoVersion), 
NewLogFieldIP("host_addr", host.ConnectAddress()), NewLogFieldString("host_id", 
host.HostID()))
             return connCfg.ProtoVersion, nil
          }

          c.session.logger.Debug("Failed to discover protocol version using 
host.",
             NewLogFieldIP("host_addr", host.ConnectAddress()), 
NewLogFieldString("host_id", host.HostID()), NewLogFieldError("err", err))
       }
    }

    return 0, err
} {code}
And this will actually eliminate the issue this ticket relates to. The original 
issue is caused by a missing beta-flag for protocol v5 with C* 3.11 and bad 
error wrapping, which also will be resolved if we just try each supported 
protocol version. Ofc it will slow down a bit the whole protocol negotiation 
process, but the tradeoff seems reasonable to me since it only happens during 
session initialisation.

But you still won't be able to connect to C* 3.11 with proto v5, because there 
is no public api to set a beta flag

> Protocol version negotiation doesn't work if server replies with stream id 
> different than 0
> -------------------------------------------------------------------------------------------
>
>                 Key: CASSGO-97
>                 URL: https://issues.apache.org/jira/browse/CASSGO-97
>             Project: Apache Cassandra Go driver
>          Issue Type: Bug
>          Components: Core
>            Reporter: João Reis
>            Priority: Normal
>             Fix For: 2.x
>
>
> If the server's ProtocolError response comes with stream id 0 then [this 
> code|https://github.com/apache/cassandra-gocql-driver/blob/0326fae3617dd19b901f2e9a97479c04fc11e05a/conn.go#L685-L700]
>  will create the protocol error object.
> If the response comes with a positive stream id then [this 
> code|https://github.com/apache/cassandra-gocql-driver/blob/0326fae3617dd19b901f2e9a97479c04fc11e05a/conn.go#L1314-L1330]
>  will create the protocol error object. This latter way of creating the error 
> makes [the regex check not 
> work|https://github.com/apache/cassandra-gocql-driver/blob/0326fae3617dd19b901f2e9a97479c04fc11e05a/control.go#L210-L245].
> This was found when trying to connect to 
> [ZDM-Proxy|https://github.com/datastax/zdm-proxy/] but connecting to a C* 
> 3.11.x cluster works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to