Brian Zhou created FLINK-18641:
----------------------------------
Summary: "Failure to finalize checkpoint" error in
MasterTriggerRestoreHook
Key: FLINK-18641
URL: https://issues.apache.org/jira/browse/FLINK-18641
Project: Flink
Issue Type: Bug
Components: Runtime / Checkpointing
Affects Versions: 1.11.0
Reporter: Brian Zhou
https://github.com/pravega/flink-connectors is a Pravega connector for Flink.
The ReaderCheckpointHook[1] class uses the Flink `MasterTriggerRestoreHook`
interface to trigger the Pravega checkpoint during Flink checkpoints to make
sure the data recovery. The checkpoint recovery tests are running fine in Flink
1.10, but it has below issues in Flink 1.11 causing the tests time out.
Error stacktrace:
{code}
2020-07-09 15:39:39,999 30945 [jobmanager-future-thread-5] WARN
o.a.f.runtime.jobmaster.JobMaster - Error while processing checkpoint
acknowledgement message
org.apache.flink.runtime.checkpoint.CheckpointException: Could not finalize the
pending checkpoint 3. Failure reason: Failure to finalize checkpoint.
at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1033)
at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:948)
at
org.apache.flink.runtime.scheduler.SchedulerBase.lambda$acknowledgeCheckpoint$4(SchedulerBase.java:802)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.SerializedThrowable: Pending checkpoint has
not been fully acknowledged yet
at
org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
at
org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpoint(PendingCheckpoint.java:298)
at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1021)
... 9 common frames omitted
{code}
More detail in this mailing thread:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Pravega-connector-cannot-recover-from-the-checkpoint-due-to-quot-Failure-to-finalize-checkpoint-quot-td36652.html
Also in https://github.com/pravega/flink-connectors/issues/387
--
This message was sent by Atlassian Jira
(v8.3.4#803005)