[ 
https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768462#comment-17768462
 ] 

Cameron Zemek edited comment on CASSANDRA-18866 at 9/24/23 11:47 PM:
---------------------------------------------------------------------

Had to make the following change for some more dtests:

Previous:
{code:java}
            @Override
            public void onFailure(InetAddressAndPort from, RequestFailureReason 
failureReason)
            {
                logger.trace("Resending ECHO_REQ to {}", addr);
                Message<NoPayload> echoMessage = Message.out(ECHO_REQ, 
noPayload);
                MessagingService.instance().sendWithCallback(echoMessage, addr, 
this);
            } {code}
After:
{code:java}
            @Override
            public void onFailure(InetAddressAndPort from, RequestFailureReason 
failureReason)
            {
                if (isEnabled())
                {
                    logger.trace("Resending ECHO_REQ to {}", addr);
                    Message<NoPayload> echoMessage = Message.out(ECHO_REQ, 
noPayload);
                    MessagingService.instance().sendWithCallback(echoMessage, 
addr, this);
                }
                else
                {
                    logger.trace("Failed ECHO_REQ to {}, aborting due to 
disabled gossip", addr);
                    inflightEcho.remove(addr);
                 }
            }
 {code}
[instaclustr/cassandra at CASSANDRA-18866-regressiontest 
(github.com)|https://github.com/instaclustr/cassandra/tree/CASSANDRA-18866-regressiontest]


was (Author: cam1982):
Had to make the following change for some more dtests:

Previous:
{code:java}
            @Override
            public void onFailure(InetAddressAndPort from, RequestFailureReason 
failureReason)
            {
                logger.trace("Resending ECHO_REQ to {}", addr);
                Message<NoPayload> echoMessage = Message.out(ECHO_REQ, 
noPayload);
                MessagingService.instance().sendWithCallback(echoMessage, addr, 
this);
            } {code}
After:
{code:java}
            @Override
            public void onFailure(InetAddressAndPort from, RequestFailureReason 
failureReason)
            {
                if (isEnabled())
                {
                    logger.trace("Resending ECHO_REQ to {}", addr);
                    Message<NoPayload> echoMessage = Message.out(ECHO_REQ, 
noPayload);
                    MessagingService.instance().sendWithCallback(echoMessage, 
addr, this);
                }
                else
                {
                    logger.trace("Failed ECHO_REQ to {}, aborting due to 
disabled gossip", addr);
                }
            }
 {code}
[instaclustr/cassandra at CASSANDRA-18866-regressiontest 
(github.com)|https://github.com/instaclustr/cassandra/tree/CASSANDRA-18866-regressiontest]

> Node sends multiple inflight echos
> ----------------------------------
>
>                 Key: CASSANDRA-18866
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18866
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Cameron Zemek
>            Priority: Normal
>         Attachments: 18866-regression.patch, duplicates.log, echo.log
>
>
> CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, 
> 18845 had change to only allow 1 inflight ECHO request at a time. As per 
> 18854 some tests have an error rate due to this change. Creating this ticket 
> to discuss this further. As the current state also does not have retry logic, 
> it just allowing multiple ECHO requests inflight at the same time so less 
> likely that all ECHO will timeout or get lost.
> With the change from 18845 adding in some extra logging to track what is 
> going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO 
> requests from a node and also see it retrying ECHOs when it doesn't get a 
> reply.
> Therefore, I think the problem is more specific than the dropping of one ECHO 
> request. Yes there no retry logic for failed ECHO requests, but this is the 
> case even both before and after 18845. ECHO requests are only sent via gossip 
> verb handlers calling applyStateLocally. In these failed tests I therefore 
> assuming their cases where it won't call markAlive when other nodes consider 
> the node UP but its marked DOWN by a node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to