[
https://issues.apache.org/jira/browse/CASSANDRA-19753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Petrov updated CASSANDRA-19753:
------------------------------------
Bug Category: Parent values: Correctness(12982)Level 1 values: Transient
Incorrect Response(12987)
Complexity: Normal
Component/s: Messaging/Client
Discovered By: Unit Test
Severity: Critical
Status: Open (was: Triage Needed)
> Not getting responses with concurrent stream IDs in native protocol v5
> ----------------------------------------------------------------------
>
> Key: CASSANDRA-19753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19753
> Project: Cassandra
> Issue Type: Bug
> Components: Messaging/Client
> Reporter: Andrea Leopardi
> Priority: Normal
> Attachments: xandra.log
>
>
> This is not gonna be an easy bug to report or to give a great set of repro
> steps for, so apologies in advance. I’m one of the authors and the maintainer
> of [Xandra|https://github.com/whatyouhide/xandra], the Cassandra client for
> Elixir.
> We noticed an issue with request timeouts in a new version of our client.
> Just for reference, the issue is [this
> one|https://github.com/whatyouhide/xandra/issues/356].
> After some debugging, we figured out that the issue was limited to *native
> protocol v5*. With native protocol v5, the issue shows up in C* 4.1 and 5.0.
> With native protocol v4, those versions (4.1 and 5.0) both work fine. I'm
> running C* in a Docker container, but I've had folks reproduce this with all
> sorts of C* setups.
> h2. The Issue
> The new version of our client in question uses concurrent requests. We assign
> each request a sequential stream ID ({{1}}, {{2}}, ...). We behave in a
> compliant way with [section 2.4.1.3. of the native protocol v5
> spec|https://github.com/apache/cassandra/blob/e7cf38b5de6f804ce121e7a676576135db0c4bb1/doc/native_protocol_v5.spec#L316C1-L316C9]—to
> the best of my knowledge.
> Now, it seems like C* does not respond do all requests this way. We have a
> [simple test|https://github.com/whatyouhide/xandra/pull/368] in our repo that
> reproduces this. It just issues two requests in parallel (with stream IDs
> {{1}} and {{2}}) and then keeps issuing requests as soon as there are
> responses. Almost 100% of the times, we don't get the response on at least
> one stream. I've also attached some debug logs that show this in case it can
> be helpful (from the client perspective). The {{<<56, 0, 2, 67, 161, ...>>}}
> syntax is Erlang's syntax for bytestrings, where each number is the decimal
> value for a single byte. You can see in the logs that we never get the
> response frame on stream ID 1. Sometimes it's stream ID 2, or 3, or whatever.
> I’m pretty short on what to do next on our end. I’ve tried shuffling around
> the socket buffer size as well (from {{10}} bytes to {{1000000}} bytes) to
> get the packets to split up in all sorts of places, but everything works as
> expected _except_ for the requests that are not coming out of C*.
> Any other help is appreciated here, but I've started to suspect this might be
> something with C*. It could totally not be, but I figured it was worth to
> post out here.
> Thank you all in advance folks! 💟
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]