kfaraz commented on code in PR #19091:
URL: https://github.com/apache/druid/pull/19091#discussion_r2893945831
##########
server/src/main/java/org/apache/druid/segment/realtime/appenderator/TransactionalSegmentPublisher.java:
##########
@@ -33,8 +33,12 @@
public abstract class TransactionalSegmentPublisher
{
- private static final int QUIET_RETRIES = 3;
- private static final int MAX_RETRIES = 5;
+ private static final int QUIET_RETRIES = 5;
+
+ /**
+ * Approximately 10 minutes of retrying using {@link
RetryUtils#nextRetrySleepMillis(int)}.
+ */
+ private static final int MAX_RETRIES = 13;
Review Comment:
I am concerned about the case where we decide not to kill off B since A is
currently pending publish,
and when it is finally time for B to publish, A still hasn't finished
publishing.
In that case, it would make sense for B to retry for a while.
The current retry count of 5 amounts to only about 1 minute.
I do want to improve upon the retry algorithm such that we throw a retryable
exception only if there is another task group pending publish for the
partitions that the current task action is trying to update (in fact, let me
try including that in this PR since we already have the new method
`isAnotherGroupPublishing`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]