Robert Metzger created FLINK-23925:
--------------------------------------
Summary: HistoryServer: Archiving job with more than one attempt
fails
Key: FLINK-23925
URL: https://issues.apache.org/jira/browse/FLINK-23925
Project: Flink
Issue Type: Bug
Components: Runtime / Coordination
Affects Versions: 1.13.2
Reporter: Robert Metzger
Error:
{code}
2021-08-23 16:26:01,953 INFO
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] -
Disconnect job manager
[email protected]://flink@localhost:6123/user/rpc/jobmanager_2
for job ca9f6a073d311d60f457a1c4243e7dc3 from the resource manager.
2021-08-23 16:26:02,137 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Could not
archive completed job
CarTopSpeedWindowingExample(ca9f6a073d311d60f457a1c4243e7dc3) to the history
server.
java.util.concurrent.CompletionException: java.lang.IllegalArgumentException:
attempt does not exist
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
~[?:1.8.0_252]
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
[?:1.8.0_252]
at
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1643)
[?:1.8.0_252]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_252]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_252]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
Caused by: java.lang.IllegalArgumentException: attempt does not exist
at
org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex.getPriorExecutionAttempt(ArchivedExecutionVertex.java:109)
~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at
org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex.getPriorExecutionAttempt(ArchivedExecutionVertex.java:31)
~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at
org.apache.flink.runtime.rest.handler.job.SubtaskExecutionAttemptDetailsHandler.archiveJsonWithPath(SubtaskExecutionAttemptDetailsHandler.java:140)
~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at
org.apache.flink.runtime.webmonitor.history.OnlyExecutionGraphJsonArchivist.archiveJsonWithPath(OnlyExecutionGraphJsonArchivist.java:51)
~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at
org.apache.flink.runtime.webmonitor.WebMonitorEndpoint.archiveJsonWithPath(WebMonitorEndpoint.java:1031)
~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at
org.apache.flink.runtime.dispatcher.JsonResponseHistoryServerArchivist.lambda$archiveExecutionGraph$0(JsonResponseHistoryServerArchivist.java:61)
~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at
org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:49)
~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
~[?:1.8.0_252]
... 3 more
{code}
Steps to reproduce:
- start a Flink reactive mode job manager:
mkdir usrlib
cp ./examples/streaming/TopSpeedWindowing.jar usrlib/
# Submit Job in Reactive Mode
./bin/standalone-job.sh start -Dscheduler-mode=reactive
-Dexecution.checkpointing.interval="10s" -j
org.apache.flink.streaming.examples.windowing.TopSpeedWindowing
# Start first TaskManager
./bin/taskmanager.sh start
- Add another taskmanager to trigger a restart
- Cancel the job
See the failure in the jobmanager logs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)