[ 
https://issues.apache.org/jira/browse/FLINK-15105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992673#comment-16992673
 ] 

Congxian Qiu(klion26) edited comment on FLINK-15105 at 12/10/19 3:49 PM:
-------------------------------------------------------------------------

[~trohrmann]  In the previous comment, I'm not want to remove the whole 
{{FailureMapper}}, but just want to remove the {{Artificial failure}} throwing 
statement in {{FailureMapper}}#{{notifyCheckpointComplete just as the comment 
in the below code block.}}
{code:java}
public T map(T value) throws Exception {
   numProcessedRecords++;

   if (isReachedFailureThreshold()) {
      throw new Exception("Artificial failure.");
   }

   return value;
}

@Override
public void notifyCheckpointComplete(long checkpointId) throws Exception {
   numCompleteCheckpoints++;

   if (isReachedFailureThreshold()) { ////// =========== just want to remove 
this ===========
      throw new Exception("Artificial failure.");
   }
}
{code}
I think the problem here is that we throw Artifical failure when completing 
checkpoint

After throwing {{Artificial failure}} in 
{{FailureMapper#notifyCheckpointComplete}}

---> we got the following exception(attached below)

---> test failed when {{check_logs_for_errors}} using the commands in 
{{common.sh}}.
{code:java}
java.lang.RuntimeException: Error while confirming checkpoint
        at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1205)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Artificial failure.
        at 
org.apache.flink.streaming.tests.FailureMapper.notifyCheckpointComplete(FailureMapper.java:70)
        at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyCheckpointComplete(AbstractUdfStreamOperator.java:130)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:822)
        at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1200)
        ... 5 more

{code}
Remove the {{Artificial failure throwing}} in 
{{FailureMapper#notifyCheckpointComplete, we can still throw }}{{Aritifical 
failure}} in {{FailureMapper#notifyCheckpointComplete}}, IMHO, the {{Artificial 
failure throwing}} is just needed when the source is finite, but in the test 
job, we use 
[SequenceGeneratorSource|https://github.com/apache/flink/blob/171020749f7fccfa7781563569e2c88ea5e8b6a1/flink-end-to-end-tests/flink-datastream-allround-test/src/main/java/org/apache/flink/streaming/tests/SequenceGeneratorSource.java#L102],
 it is an infinite source.


was (Author: klion26):
[~trohrmann]  In the previous comment, I'm not want to remove the whole 
{{FailureMapper}}, but just want to remove the {{Artificial failure}} throwing 
statement in {{FailureMapper}}#{{notifyCheckpointComplete just as the comment 
in the below code block.}}

I think the problem here is that we throw Artifical failure when completing 
checkpoint(we'll throw Artifical failure in two places  in {{FailureMapper}})
{code:java}
public T map(T value) throws Exception {
   numProcessedRecords++;

   if (isReachedFailureThreshold()) {
      throw new Exception("Artificial failure.");
   }

   return value;
}

@Override
public void notifyCheckpointComplete(long checkpointId) throws Exception {
   numCompleteCheckpoints++;

   if (isReachedFailureThreshold()) { ////// =========== just want to remove 
this ===========
      throw new Exception("Artificial failure.");
   }
}
{code}
After throwing {{Artificial failure}} in 
{{FailureMapper#notifyCheckpointComplete}}

---> we got the following exception(attached below)

---> test failed when {{check_logs_for_errors}} using the commands in 
{{common.sh}}.
{code:java}
java.lang.RuntimeException: Error while confirming checkpoint
        at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1205)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Artificial failure.
        at 
org.apache.flink.streaming.tests.FailureMapper.notifyCheckpointComplete(FailureMapper.java:70)
        at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyCheckpointComplete(AbstractUdfStreamOperator.java:130)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:822)
        at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1200)
        ... 5 more

{code}
Remove the {{Artificial failure throwing}} in 
{{FailureMapper#notifyCheckpointComplete, we can still throw }}{{Aritifical 
failure}} in {{FailureMapper#notifyCheckpointComplete}}, IMHO, the {{Artificial 
failure throwing}} is just needed when the source is finite, but in the test 
job, we use 
[SequenceGeneratorSource|https://github.com/apache/flink/blob/171020749f7fccfa7781563569e2c88ea5e8b6a1/flink-end-to-end-tests/flink-datastream-allround-test/src/main/java/org/apache/flink/streaming/tests/SequenceGeneratorSource.java#L102],
 it is an infinite source.

> Resuming Externalized Checkpoint after terminal failure (rocks, incremental) 
> end-to-end test stalls on travis
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-15105
>                 URL: https://issues.apache.org/jira/browse/FLINK-15105
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.10.0
>            Reporter: Yu Li
>            Priority: Critical
>              Labels: test-stability
>             Fix For: 1.10.0
>
>
> Resuming Externalized Checkpoint after terminal failure (rocks, incremental) 
> end-to-end test fails on release-1.9 nightly build stalls with "The job 
> exceeded the maximum log length, and has been terminated".
> https://api.travis-ci.org/v3/job/621090394/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to