[ 
https://issues.apache.org/jira/browse/CASSANDRA-20059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17897683#comment-17897683
 ] 

David Capwell commented on CASSANDRA-20059:
-------------------------------------------

.bq  I think most would agree that the intent of the method retryIndefinitely 
is pretty clear and that if you want to only retry for a certain period of 
time, this is not the method you should use

So, the API in TCM has Retry.Deadline as a common argument (rather than just 
Retry), and the method given this doesn't know the specific semantics you are 
trying to get working as the caller... they just know a Retry.Deadline retries 
until a specific deadline, so TCM code has evolved in Accord to give up 
retrying after this deadline (which is what the class defines its behavior 
as)... this has then caused messages to keep being sent even after giving up... 
 

.bq Basically, this is working as intended and I'm afraid you have been using 
it incorrectly

I don't feel it is working as intended as there is TCM code that doesn't retry 
indefinitely from the callers point of view, just in message sending... 

For TCM to handle this "correctly" it needs to ignore the deadline as its not 
accurate, and must only check reachedMax... if this is the case why do we have 
Retry.Deadline as the required type when intersecting with TCM rather than just 
Retry where this behavior would be correct?

> TCM's Retry.Deadline#retryIndefinitely is dangerous if used with 
> RemoteProcessor as the deadline does not impact message retries
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20059
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20059
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Transactional Cluster Metadata
>            Reporter: David Capwell
>            Assignee: David Capwell
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> public static Deadline retryIndefinitely(long timeoutNanos, Meter retryMeter)
> {
>     return new Deadline(Clock.Global.nanoTime() + timeoutNanos,
>                         new Retry.Jitter(Integer.MAX_VALUE, 
> DEFAULT_BACKOFF_MS, new Random(), retryMeter))
>     {
>         @Override
>         public boolean reachedMax()
>         {
>             return false;
>         }
>         @Override
>         public long remainingNanos()
>         {
>             return timeoutNanos;
>         }
>         public String toString()
>         {
>             return String.format("RetryIndefinitely{tries=%d}", 
> currentTries());
>         }
>     };
> }
> {code}
> Sample usage pattern (example is in Accord, but same pattern exists in 
> RemoteProcessor.commit)
> {code}
> Promise<LogState> request = new AsyncPromise<>();
> List<InetAddressAndPort> candidates = new 
> ArrayList<>(log.metadata().fullCMSMembers());
> sendWithCallbackAsync(request,
>                       Verb.TCM_RECONSTRUCT_EPOCH_REQ,
>                       new ReconstructLogState(lowEpoch, highEpoch, 
> includeSnapshot),
>                       new CandidateIterator(candidates),
>                       retryPolicy);
> return request.get(retryPolicy.remainingNanos(), TimeUnit.NANOSECONDS);
> {code}
> The issue here is that the networking retry has no clue that we gave up 
> waiting on the request, so we will keep retrying until success!  The reason 
> for this is “reachedMax” is used to see if its safe to run again, but it 
> isn’t as the deadline has passed!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to