[
https://issues.apache.org/jira/browse/CASSANDRA-20059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17897683#comment-17897683
]
David Capwell commented on CASSANDRA-20059:
-------------------------------------------
.bq I think most would agree that the intent of the method retryIndefinitely
is pretty clear and that if you want to only retry for a certain period of
time, this is not the method you should use
So, the API in TCM has Retry.Deadline as a common argument (rather than just
Retry), and the method given this doesn't know the specific semantics you are
trying to get working as the caller... they just know a Retry.Deadline retries
until a specific deadline, so TCM code has evolved in Accord to give up
retrying after this deadline (which is what the class defines its behavior
as)... this has then caused messages to keep being sent even after giving up...
.bq Basically, this is working as intended and I'm afraid you have been using
it incorrectly
I don't feel it is working as intended as there is TCM code that doesn't retry
indefinitely from the callers point of view, just in message sending...
For TCM to handle this "correctly" it needs to ignore the deadline as its not
accurate, and must only check reachedMax... if this is the case why do we have
Retry.Deadline as the required type when intersecting with TCM rather than just
Retry where this behavior would be correct?
> TCM's Retry.Deadline#retryIndefinitely is dangerous if used with
> RemoteProcessor as the deadline does not impact message retries
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-20059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20059
> Project: Cassandra
> Issue Type: Bug
> Components: Transactional Cluster Metadata
> Reporter: David Capwell
> Assignee: David Capwell
> Priority: Normal
> Fix For: 5.x
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> {code}
> public static Deadline retryIndefinitely(long timeoutNanos, Meter retryMeter)
> {
> return new Deadline(Clock.Global.nanoTime() + timeoutNanos,
> new Retry.Jitter(Integer.MAX_VALUE,
> DEFAULT_BACKOFF_MS, new Random(), retryMeter))
> {
> @Override
> public boolean reachedMax()
> {
> return false;
> }
> @Override
> public long remainingNanos()
> {
> return timeoutNanos;
> }
> public String toString()
> {
> return String.format("RetryIndefinitely{tries=%d}",
> currentTries());
> }
> };
> }
> {code}
> Sample usage pattern (example is in Accord, but same pattern exists in
> RemoteProcessor.commit)
> {code}
> Promise<LogState> request = new AsyncPromise<>();
> List<InetAddressAndPort> candidates = new
> ArrayList<>(log.metadata().fullCMSMembers());
> sendWithCallbackAsync(request,
> Verb.TCM_RECONSTRUCT_EPOCH_REQ,
> new ReconstructLogState(lowEpoch, highEpoch,
> includeSnapshot),
> new CandidateIterator(candidates),
> retryPolicy);
> return request.get(retryPolicy.remainingNanos(), TimeUnit.NANOSECONDS);
> {code}
> The issue here is that the networking retry has no clue that we gave up
> waiting on the request, so we will keep retrying until success! The reason
> for this is “reachedMax” is used to see if its safe to run again, but it
> isn’t as the deadline has passed!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]