[
https://issues.apache.org/jira/browse/FLINK-20252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235878#comment-17235878
]
Yuan Mei commented on FLINK-20252:
----------------------------------
JM Log:
{code:java}
2020-11-20 11:05:40,914 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom
Source (1/1) (a431944bc4634099f59062c0e929fa9e) switched from SCHEDULED to
DEPLOYING. 2020-11-20 11:05:40,914 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Source:
Custom Source (1/1) (attempt #0) with attempt id
a431944bc4634099f59062c0e929fa9e to container_e17_1602580065114_0666_01_000002
@ i22a12256.sqa.eu95.tbsite.net (dataPort=57573) with allocation id
16cd53e0dba4d01daabfc95e2d794a38 2020-11-20 11:05:40,918 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Map (1/1)
(c33ea0a9306f7559f1c714f0527b30e4) switched from SCHEDULED to DEPLOYING.
2020-11-20 11:05:40,918 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Map (1/1)
(attempt #0) with attempt id c33ea0a9306f7559f1c714f0527b30e4 to
container_e17_1602580065114_0666_01_000003 @ i22a09265.sqa.eu95.tbsite.net
(dataPort=33785) with allocation id c1d9691579e29b35c2ed54d9721a29f1 2020-11-20
11:05:40,920 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] -
Sink: Print to Std. Out (1/1) (1f3db4f707fd453ceef6aa72c2003f43) switched from
SCHEDULED to DEPLOYING. 2020-11-20 11:05:40,920 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Sink:
Print to Std. Out (1/1) (attempt #0) with attempt id
1f3db4f707fd453ceef6aa72c2003f43 to container_e17_1602580065114_0666_01_000003
@ i22a09265.sqa.eu95.tbsite.net (dataPort=33785) with allocation id
c1d9691579e29b35c2ed54d9721a29f1 2020-11-20 11:05:41,023 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom
Source (1/1) (a431944bc4634099f59062c0e929fa9e) switched from DEPLOYING to
RUNNING. 2020-11-20 11:05:41,093 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Map (1/1)
(c33ea0a9306f7559f1c714f0527b30e4) switched from DEPLOYING to RUNNING.
2020-11-20 11:05:41,093 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Print to Std.
Out (1/1) (1f3db4f707fd453ceef6aa72c2003f43) switched from DEPLOYING to
RUNNING. 2020-11-20 11:05:42,900 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Map (1/1)
(c33ea0a9306f7559f1c714f0527b30e4) switched from RUNNING to FAILED on
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@5f54bf85.
java.lang.Exception: Artificial Test Failure at
org.apache.flink.streaming.examples.wordcount.ApproximateFailover$FailingMapper.map(ApproximateFailover.java:139)
~[ApproximateFailover.jar:1.12-SNAPSHOT] at
org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:193)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:179)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:152)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:372)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:574)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:538)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at
org.apache.flink.runtime.taskmanager.Task.run(Task.java:547)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at
java.lang.Thread.run(Thread.java:834) ~[?:1.8.0_102] 2020-11-20 11:05:42,920
INFO
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy
[] - Calculating tasks to restart to recover the failed task
0a448493b4782967b150582570326227_0. 2020-11-20 11:05:42,920 INFO
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy
[] - 2 tasks should be restarted to recover the failed task
0a448493b4782967b150582570326227_0. 2020-11-20 11:05:42,921 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job Streaming
WordCount (f6a90cd24b77fa6c49657ac698424c7a) switched from state RUNNING to
RESTARTING. 2020-11-20 11:05:42,921 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Print to Std.
Out (1/1) (1f3db4f707fd453ceef6aa72c2003f43) switched from RUNNING to
CANCELING. 2020-11-20 11:05:42,928 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Print to Std.
Out (1/1) (1f3db4f707fd453ceef6aa72c2003f43) switched from CANCELING to
CANCELED. 2020-11-20 11:05:42,930 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job Streaming
WordCount (f6a90cd24b77fa6c49657ac698424c7a) switched from state RESTARTING to
RUNNING. 2020-11-20 11:05:42,935 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Map (1/1)
(eb45ad053c6c42400ba039535ab45c93) switched from CREATED to SCHEDULED.
2020-11-20 11:05:42,935 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Print to Std.
Out (1/1) (18a234f28ed29e38f65ac1e1aa36319c) switched from CREATED to
SCHEDULED. 2020-11-20 11:05:42,937 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Map (1/1)
(eb45ad053c6c42400ba039535ab45c93) switched from SCHEDULED to DEPLOYING.
2020-11-20 11:05:42,937 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Map (1/1)
(attempt #1) with attempt id eb45ad053c6c42400ba039535ab45c93 to
container_e17_1602580065114_0666_01_000003 @ i22a09265.sqa.eu95.tbsite.net
(dataPort=33785) with allocation id c1d9691579e29b35c2ed54d9721a29f1 2020-11-20
11:05:42,937 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] -
Sink: Print to Std. Out (1/1) (18a234f28ed29e38f65ac1e1aa36319c) switched from
SCHEDULED to DEPLOYING. 2020-11-20 11:05:42,937 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Sink:
Print to Std. Out (1/1) (attempt #1) with attempt id
18a234f28ed29e38f65ac1e1aa36319c to container_e17_1602580065114_0666_01_000003
@ i22a09265.sqa.eu95.tbsite.net (dataPort=33785) with allocation id
c1d9691579e29b35c2ed54d9721a29f1 2020-11-20 11:05:42,946 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Map (1/1)
(eb45ad053c6c42400ba039535ab45c93) switched from DEPLOYING to RUNNING.
2020-11-20 11:05:42,947 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Print to Std.
Out (1/1) (18a234f28ed29e38f65ac1e1aa36319c) switched from DEPLOYING to
RUNNING.{code}
> Manual Test for Approximate Local Recovery
> ------------------------------------------
>
> Key: FLINK-20252
> URL: https://issues.apache.org/jira/browse/FLINK-20252
> Project: Flink
> Issue Type: Sub-task
> Reporter: Yuan Mei
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.12.0
>
>
> This manual test is to verify approximate failover works as expected in
> clusters.
>
> *Test Job Set up:*
> source -> mapper -> printer
> mapper fails after a certain number of records are received.
> link: [https://github.com/apache/flink/pull/14146]
> *Cluster Set up:*
> Test on Yarn clustersm
> add this to flink-conf.yaml
> jobmanager.scheduler.scheduling-strategy: legacy
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)