Cameron Zemek created CASSANDRA-18866:
-----------------------------------------
Summary: Node sends multiple inflight echos
Key: CASSANDRA-18866
URL: https://issues.apache.org/jira/browse/CASSANDRA-18866
Project: Cassandra
Issue Type: Improvement
Reporter: Cameron Zemek
Attachments: echo.log
CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular,
18845 had change to only allow 1 inflight ECHO request at a time. As per 18854
some tests have an error rate due to this change. Creating this ticket to
discuss this further. As the current state also does not have retry logic, it
just allowing multiple ECHO requests inflight at the same time so less likely
that all ECHO will timeout or get lost.
With the change from 18845 adding in some extra logging to track what is going
on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO
requests from a node and also see it retrying ECHOs when it doesn't get a reply.
Therefore, I think the problem is more specific than the dropping of one ECHO
request. Yes there no retry logic for failed ECHO requests, but this is the
case even both before and after 18845. ECHO requests are only sent via gossip
verb handlers calling applyStateLocally. In these failed tests I therefore
assuming their cases where it won't call markAlive when other nodes consider
the node UP but its marked DOWN by a node.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]