pnowojski commented on code in PR #22806:
URL: https://github.com/apache/flink/pull/22806#discussion_r1246174618


##########
flink-runtime/src/main/java/org/apache/flink/runtime/source/coordinator/SourceCoordinator.java:
##########
@@ -195,9 +195,19 @@ void announceCombinedWatermark() {
                 "Distributing maxAllowedWatermark={} to subTaskIds={}",
                 maxAllowedWatermark,
                 subTaskIds);
-        for (Integer subtaskId : subTaskIds) {
-            context.sendEventToSourceOperator(
-                    subtaskId, new 
WatermarkAlignmentEvent(maxAllowedWatermark));
+        // Because of Java-ThreadPoolExecutor will not schedule the period task
+        // if it throws an exception, so we should handle the potential 
exception like
+        // "subtask xx is not ready yet to receive events" to increase 
robustness.
+        try {
+            for (Integer subtaskId : subTaskIds) {
+                context.sendEventToSourceOperator(
+                        subtaskId, new 
WatermarkAlignmentEvent(maxAllowedWatermark));
+            }

Review Comment:
   > remove context.getCoordinatorExecutor() , and provide a function to 
schedule period task like below, PeriodTaskHook is a hook to let caller decide 
whether ignore the failure or fail job.
   
   ~Yes, I think we need something like that.~
   edit:
   
   or maybe not? Maybe we can add 
`SourceCoordinatorContext#sendEventToSourceOperatorIgnoreRetryableErrors` 
method, similar to `SourceCoordinatorContext#sendEventToSourceOperator` but 
with added logic for ignoring retryable errors.  Probably something like 
`SourceCoordinatorContext#handleUncaughtExceptionFromAsyncCall` but with 
ignoring retryable errors logic.
   
   > For SourceAlignment, its auto recoverable after task become ready, so we 
can ignore these failure?
   
   Ok, that should be fair enough, but we would need to distinguish those 
"subtask is not ready" errors from anything else. Ignoring all `Throwable 
ignored` is a bad idea, there might be other bugs or issues (`OOM`, `NPE`, and 
many many more).
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to