cryptoe commented on code in PR #13353:
URL: https://github.com/apache/druid/pull/13353#discussion_r1064640320


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/Limits.java:
##########
@@ -68,4 +68,20 @@
    * Maximum size of the kernel manipulation queue in {@link 
org.apache.druid.msq.indexing.MSQControllerTask}.
    */
   public static final int MAX_KERNEL_MANIPULATION_QUEUE_SIZE = 100_000;
+
+  /**
+   * Maximum relaunches across all workers.
+   */
+  public static final int TOTAL_RELAUNCH_LIMIT = 100;
+
+  /**
+   * Maximum relaunches per worker. Initial run is not a relaunch. The worker 
will be spawned 1 + workerRelaunchLimit times before erroring out.
+   */
+  public static final int PER_WORKER_RELAUNCH_LIMIT = 2;

Review Comment:
   I think 2 is fine. I would argue to keep it 1 :) since if the job fails once 
let's say due to oom, the likelyhood of the job failing again is very high. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to