1996fanrui commented on code in PR #21281:
URL: https://github.com/apache/flink/pull/21281#discussion_r1018967494
##########
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointFailureManager.java:
##########
@@ -204,7 +204,8 @@ private void checkFailureAgainstCounter(
if (continuousFailureCounter.get() > tolerableCpFailureNumber) {
clearCount();
errorHandler.accept(
- new
FlinkRuntimeException(EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE));
+ new FlinkRuntimeException(
+ EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE,
exception));
Review Comment:
@Myasuka Thanks for your feedback.
You are right, the correct way is check full information from JM log or
checkpoint UI.
Actually, I added this due to some reasons:
- Some Flink platforms collect exceptions. When the job fails and JM stops,
users can easily see the root cause of the last checkpoint through the
exception. At this point WebUI has stopped, and it is more convenient than JM
LOG.
- Displaying more root cause has no effect on the original logic.
- When developing some features, ITCase is often run without LOG enabled.
Some ITCases fail, it just shows `Exceeded checkpoint tolerable failure
threshold.`, doesn't show the root cause. Inconvenient to locate the problem. 😂
I also don't think this change is necessary. You can take a look at these
reasons and I will close this PR if not needed. Thanks~
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]