Cameron Zemek created CASSANDRA-18866:
-----------------------------------------

             Summary: Node sends multiple inflight echos
                 Key: CASSANDRA-18866
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18866
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Cameron Zemek
         Attachments: echo.log

CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, 
18845 had change to only allow 1 inflight ECHO request at a time. As per 18854 
some tests have an error rate due to this change. Creating this ticket to 
discuss this further. As the current state also does not have retry logic, it 
just allowing multiple ECHO requests inflight at the same time so less likely 
that all ECHO will timeout or get lost.

With the change from 18845 adding in some extra logging to track what is going 
on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO 
requests from a node and also see it retrying ECHOs when it doesn't get a reply.

Therefore, I think the problem is more specific than the dropping of one ECHO 
request. Yes there no retry logic for failed ECHO requests, but this is the 
case even both before and after 18845. ECHO requests are only sent via gossip 
verb handlers calling applyStateLocally. In these failed tests I therefore 
assuming their cases where it won't call markAlive when other nodes consider 
the node UP but its marked DOWN by a node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to