[
https://issues.apache.org/jira/browse/FLINK-16770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072451#comment-17072451
]
Robert Metzger commented on FLINK-16770:
----------------------------------------
I have understood the following so far:
- the test is searching for the checkpoint directory, but no checkpoint exists
- It seems that the checkpoint N does not get retained, if N+1 gets triggered
and the job gets cancelled immediately thereafter.
The job has checkpoint retention on cancellation enabled.
Proof:
{code}
$ cat flink-vsts-standalonesession-0-fv-az53.log | grep
"CheckpointCoo\|job.lastCheckpointExternalPath\|switched from state"
2020-04-01 06:30:18,805 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job General
purpose test job (c34d2f91cf100e020226725452b5000a) switched from state CREATED
to RUNNING.
localhost.jobmanager.General purpose test job.lastCheckpointExternalPath: n/a
2020-04-01 06:30:19,571 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Checkpoint
triggering task Source: EventSource -> Timestamps/Watermarks (1/4) of job
c34d2f91cf100e020226725452b5000a is not in state RUNNING but DEPLOYING instead.
Aborting checkpoint.
localhost.jobmanager.General purpose test job.lastCheckpointExternalPath: n/a
2020-04-01 06:30:20,597 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
checkpoint 1 @ 1585722620570 for job c34d2f91cf100e020226725452b5000a.
2020-04-01 06:30:21,170 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
checkpoint 1 for job c34d2f91cf100e020226725452b5000a (158574 bytes in 597 ms).
localhost.jobmanager.General purpose test job.lastCheckpointExternalPath:
file:/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-1
2020-04-01 06:30:21,570 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
checkpoint 2 @ 1585722621570 for job c34d2f91cf100e020226725452b5000a.
2020-04-01 06:30:21,689 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
checkpoint 2 for job c34d2f91cf100e020226725452b5000a (274341 bytes in 113 ms).
localhost.jobmanager.General purpose test job.lastCheckpointExternalPath:
file:/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-2
2020-04-01 06:30:22,571 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
checkpoint 3 @ 1585722622570 for job c34d2f91cf100e020226725452b5000a.
2020-04-01 06:30:22,689 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
checkpoint 3 for job c34d2f91cf100e020226725452b5000a (326291 bytes in 118 ms).
localhost.jobmanager.General purpose test job.lastCheckpointExternalPath:
file:/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-3
2020-04-01 06:30:23,571 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
checkpoint 4 @ 1585722623570 for job c34d2f91cf100e020226725452b5000a.
2020-04-01 06:30:23,650 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
checkpoint 4 for job c34d2f91cf100e020226725452b5000a (341697 bytes in 78 ms).
localhost.jobmanager.General purpose test job.lastCheckpointExternalPath:
file:/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-4
2020-04-01 06:30:24,570 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
checkpoint 5 @ 1585722624570 for job c34d2f91cf100e020226725452b5000a.
2020-04-01 06:30:24,643 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
checkpoint 5 for job c34d2f91cf100e020226725452b5000a (345026 bytes in 72 ms).
localhost.jobmanager.General purpose test job.lastCheckpointExternalPath:
file:/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5
2020-04-01 06:30:25,571 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
checkpoint 6 @ 1585722625570 for job c34d2f91cf100e020226725452b5000a.
2020-04-01 06:30:25,659 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
checkpoint 6 for job c34d2f91cf100e020226725452b5000a (347049 bytes in 88 ms).
localhost.jobmanager.General purpose test job.lastCheckpointExternalPath:
file:/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-6
2020-04-01 06:30:26,571 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
checkpoint 7 @ 1585722626570 for job c34d2f91cf100e020226725452b5000a.
2020-04-01 06:30:26,633 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
checkpoint 7 for job c34d2f91cf100e020226725452b5000a (349427 bytes in 60 ms).
localhost.jobmanager.General purpose test job.lastCheckpointExternalPath:
file:/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-7
2020-04-01 06:30:27,570 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
checkpoint 8 @ 1585722627570 for job c34d2f91cf100e020226725452b5000a.
2020-04-01 06:30:27,677 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job General
purpose test job (c34d2f91cf100e020226725452b5000a) switched from state RUNNING
to CANCELLING.
2020-04-01 06:30:28,131 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job General
purpose test job (c34d2f91cf100e020226725452b5000a) switched from state
CANCELLING to CANCELED.
2020-04-01 06:30:28,131 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Stopping
checkpoint coordinator for job c34d2f91cf100e020226725452b5000a.
2020-04-01 06:30:28,131 INFO
org.apache.flink.runtime.checkpoint.CompletedCheckpoint [] - Checkpoint
with ID 8 at
'file:/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-8'
not discarded.
2020-04-01 06:30:30,758 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Starting job
597e0c903b4fe886a0dfab962a9bcf93 from savepoint file://. ()
{code}
Before cancellation, this is the content of the checkpoint dir:
{code}
2020-04-01T06:30:25.4552903Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir
2020-04-01T06:30:25.4554109Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a
2020-04-01T06:30:25.4555100Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/shared
2020-04-01T06:30:25.4556094Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/taskowned
2020-04-01T06:30:25.4557084Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5
2020-04-01T06:30:25.4558165Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/2ea88999-1416-4880-9556-8e09e160bb41
2020-04-01T06:30:25.4559321Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/b2bb185d-dab4-430e-990a-295eb905d97d
2020-04-01T06:30:25.4560483Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/24013bb5-c371-44ff-83ab-dde53a8cf1cd
2020-04-01T06:30:25.4561549Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/_metadata
2020-04-01T06:30:25.4563080Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/6047540d-e336-4ed1-ab67-80ee222135c3
2020-04-01T06:30:25.4564570Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/51c50ce7-909b-4525-a098-cabd5e0377d9
2020-04-01T06:30:25.4565810Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/c818c3cd-e91d-454f-99ce-6c61c6f55aa7
2020-04-01T06:30:25.4567043Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/b8c8a828-2a7e-4197-a413-3b1a777b2788
2020-04-01T06:30:25.4568272Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/faafce1d-f281-4b76-a11b-167b7fb1e22e
2020-04-01T06:30:25.4569619Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/1dd494fa-3887-4d02-9567-79ffeb79afe6
2020-04-01T06:30:25.4570991Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/e11d0e0e-cb1e-44b3-a593-fc96cf9173bb
2020-04-01T06:30:25.4572877Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/b2ee6c2a-d980-470b-83a1-eddb8fb83a81
2020-04-01T06:30:25.4574130Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/a2c87162-f591-4a51-9caf-7594c7f110d7
2020-04-01T06:30:25.4575357Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/7e6efd4c-9dee-478c-8ff3-218fe461030b
2020-04-01T06:30:25.4576568Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/e2dbe0f9-8120-4fca-8833-e53b5e8ec1d1
2020-04-01T06:30:25.4589585Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/6408dffc-99ed-43f0-afbd-5d3836af58a5
2020-04-01T06:30:25.4591144Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/6b27e2f2-cdcc-417e-91e2-6bbf5dfcaeab
2020-04-01T06:30:25.4592357Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/c47b20ed-e29a-4a78-a9ad-8e30a80765b0
2020-04-01T06:30:25.4593849Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/140167a5-8d86-4a8a-97cc-91a8c272ed0a
2020-04-01T06:30:25.4595121Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/050b237c-2a3c-4dec-a242-8fe93558d459
2020-04-01T06:30:25.4596528Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/c60c4f87-f437-466b-801a-a290b0e07f7c
2020-04-01T06:30:25.4597721Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/21f99bc1-21c1-44bf-91c9-312790051844
2020-04-01T06:30:25.4598900Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/a4c763a3-bf51-4d67-88de-377019cd489a
2020-04-01T06:30:25.4600099Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/91cfe5fd-51ef-46cd-ba15-fada90d1ced5
2020-04-01T06:30:25.4601551Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/b58caf97-a2ca-47d6-bd32-a7e005e54d01
2020-04-01T06:30:25.4602772Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/fa2d79e2-f6da-4ec7-9450-8d9eaba4a689
2020-04-01T06:30:25.4604562Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/37e56b80-4772-40fb-a605-9427ec310106
2020-04-01T06:30:25.4605790Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/6f93dfc5-5440-44f6-aade-6e68542cbb14
2020-04-01T06:30:25.4607168Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/d2dadd16-eda7-47e2-b31f-babf62fb59a5
2020-04-01T06:30:25.4608385Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/6117955e-6513-4a8e-8e24-dfab248491fd
2020-04-01T06:30:25.4609791Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/0f038eb9-1c9a-45b8-8e67-ffdac782a7e3
2020-04-01T06:30:25.4611002Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/183e1715-3c31-4946-9153-7ce4450e056c
2020-04-01T06:30:25.4612183Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/6f819cea-cb94-42ef-819c-9f2ef21e145f
2020-04-01T06:30:25.4613496Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/8fce4fc7-20e3-4e79-a329-42b173cea697
2020-04-01T06:30:25.4614687Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/4f2e6e30-1f14-4ac6-a956-8de22d168bd9
2020-04-01T06:30:25.4618606Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/61d27b66-7897-43b9-ad40-440ca5fafaf3
2020-04-01T06:30:25.4619845Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/chk-5/723d4610-7004-4b52-96b6-15af5b311287
{code}
After cancellation:
{code}
2020-04-01T06:30:27.8809772Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir
2020-04-01T06:30:27.8811029Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a
2020-04-01T06:30:27.8812272Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/shared
2020-04-01T06:30:27.8813488Z
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-08993937085/externalized-chckpt-e2e-backend-dir/c34d2f91cf100e020226725452b5000a/taskowned
{code}
I believe there's something wrong in the checkpoint coordinator
> Resuming Externalized Checkpoint (rocks, incremental, scale up) end-to-end
> test fails with no such file
> -------------------------------------------------------------------------------------------------------
>
> Key: FLINK-16770
> URL: https://issues.apache.org/jira/browse/FLINK-16770
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing, Tests
> Affects Versions: 1.11.0
> Reporter: Zhijiang
> Assignee: Robert Metzger
> Priority: Blocker
> Labels: test-stability
> Fix For: 1.11.0
>
> Attachments: flink-vsts-standalonesession-0-fv-az53.log
>
>
> The log :
> [https://dev.azure.com/rmetzger/Flink/_build/results?buildId=6603&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5]
>
> There was also the similar problem in
> https://issues.apache.org/jira/browse/FLINK-16561, but for the case of no
> parallelism change. And this case is for scaling up. Not quite sure whether
> the root cause is the same one.
> {code:java}
> 2020-03-25T06:50:31.3894841Z Running 'Resuming Externalized Checkpoint
> (rocks, incremental, scale up) end-to-end test'
> 2020-03-25T06:50:31.3895308Z
> ==============================================================================
> 2020-03-25T06:50:31.3907274Z TEST_DATA_DIR:
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-31390197304
> 2020-03-25T06:50:31.5500274Z Flink dist directory:
> /home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT
> 2020-03-25T06:50:31.6354639Z Starting cluster.
> 2020-03-25T06:50:31.8871932Z Starting standalonesession daemon on host
> fv-az655.
> 2020-03-25T06:50:33.5021784Z Starting taskexecutor daemon on host fv-az655.
> 2020-03-25T06:50:33.5152274Z Waiting for Dispatcher REST endpoint to come
> up...
> 2020-03-25T06:50:34.5498116Z Waiting for Dispatcher REST endpoint to come
> up...
> 2020-03-25T06:50:35.6031346Z Waiting for Dispatcher REST endpoint to come
> up...
> 2020-03-25T06:50:36.9848425Z Waiting for Dispatcher REST endpoint to come
> up...
> 2020-03-25T06:50:38.0283377Z Dispatcher REST endpoint is up.
> 2020-03-25T06:50:38.0285490Z Running externalized checkpoints test, with
> ORIGINAL_DOP=2 NEW_DOP=4 and STATE_BACKEND_TYPE=rocks
> STATE_BACKEND_FILE_ASYNC=true STATE_BACKEND_ROCKSDB_INCREMENTAL=true
> SIMULATE_FAILURE=false ...
> 2020-03-25T06:50:46.1754645Z Job (b8cb04e4b1e730585bc616aa352866d0) is
> running.
> 2020-03-25T06:50:46.1758132Z Waiting for job
> (b8cb04e4b1e730585bc616aa352866d0) to have at least 1 completed checkpoints
> ...
> 2020-03-25T06:50:46.3478276Z Waiting for job to process up to 200 records,
> current progress: 173 records ...
> 2020-03-25T06:50:49.6332988Z Cancelling job b8cb04e4b1e730585bc616aa352866d0.
> 2020-03-25T06:50:50.4875673Z Cancelled job b8cb04e4b1e730585bc616aa352866d0.
> 2020-03-25T06:50:50.5468230Z ls: cannot access
> '/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-31390197304/externalized-chckpt-e2e-backend-dir/b8cb04e4b1e730585bc616aa352866d0/chk-[1-9]*/_metadata':
> No such file or directory
> 2020-03-25T06:50:50.5606260Z Restoring job with externalized checkpoint at .
> ...
> 2020-03-25T06:50:58.4728245Z
> 2020-03-25T06:50:58.4732663Z
> ------------------------------------------------------------
> 2020-03-25T06:50:58.4735785Z The program finished with the following
> exception:
> 2020-03-25T06:50:58.4737759Z
> 2020-03-25T06:50:58.4742666Z
> org.apache.flink.client.program.ProgramInvocationException: The main method
> caused an error: java.util.concurrent.ExecutionException:
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit
> JobGraph.
> 2020-03-25T06:50:58.4746274Z at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
> 2020-03-25T06:50:58.4749954Z at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
> 2020-03-25T06:50:58.4752753Z at
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:142)
> 2020-03-25T06:50:58.4755400Z at
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:659)
> 2020-03-25T06:50:58.4757862Z at
> org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)
> 2020-03-25T06:50:58.4760282Z at
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:890)
> 2020-03-25T06:50:58.4763591Z at
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:963)
> 2020-03-25T06:50:58.4764274Z at
> java.security.AccessController.doPrivileged(Native Method)
> 2020-03-25T06:50:58.4764809Z at
> javax.security.auth.Subject.doAs(Subject.java:422)
> 2020-03-25T06:50:58.4765434Z at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> 2020-03-25T06:50:58.4766180Z at
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> 2020-03-25T06:50:58.4773549Z at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:963)
> 2020-03-25T06:50:58.4774502Z Caused by: java.lang.RuntimeException:
> java.util.concurrent.ExecutionException:
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit
> JobGraph.
> 2020-03-25T06:50:58.4775382Z at
> org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:276)
> 2020-03-25T06:50:58.4776163Z at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1741)
> 2020-03-25T06:50:58.4777706Z at
> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:90)
> 2020-03-25T06:50:58.4778334Z at
> org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:58)
> 2020-03-25T06:50:58.4779007Z at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1620)
> 2020-03-25T06:50:58.4779654Z at
> org.apache.flink.streaming.tests.DataStreamAllroundTestProgram.main(DataStreamAllroundTestProgram.java:215)
> 2020-03-25T06:50:58.4780371Z at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-03-25T06:50:58.4784367Z at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-03-25T06:50:58.4785063Z at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-03-25T06:50:58.4785557Z at
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-03-25T06:50:58.4786204Z at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)
> 2020-03-25T06:50:58.4786547Z ... 11 more
> 2020-03-25T06:50:58.4787007Z Caused by:
> java.util.concurrent.ExecutionException:
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit
> JobGraph.
> 2020-03-25T06:50:58.4787717Z at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> 2020-03-25T06:50:58.4788203Z at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2020-03-25T06:50:58.4788835Z at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1736)
> 2020-03-25T06:50:58.4789362Z ... 20 more
> 2020-03-25T06:50:58.4789720Z Caused by:
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit
> JobGraph.
> 2020-03-25T06:50:58.4790467Z at
> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$7(RestClusterClient.java:359)
> 2020-03-25T06:50:58.4791087Z at
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884)
> 2020-03-25T06:50:58.4791650Z at
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866)
> 2020-03-25T06:50:58.4792560Z at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-03-25T06:50:58.4793617Z at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
> 2020-03-25T06:50:58.4794496Z at
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$8(FutureUtils.java:274)
> 2020-03-25T06:50:58.4795255Z at
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2020-03-25T06:50:58.4796264Z at
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2020-03-25T06:50:58.4796867Z at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-03-25T06:50:58.4797439Z at
> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575)
> 2020-03-25T06:50:58.4798000Z at
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:943)
> 2020-03-25T06:50:58.4798589Z at
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
> 2020-03-25T06:50:58.4799162Z at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 2020-03-25T06:50:58.4799727Z at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 2020-03-25T06:50:58.4800210Z at java.lang.Thread.run(Thread.java:748)
> 2020-03-25T06:50:58.4800767Z Caused by:
> org.apache.flink.runtime.rest.util.RestClientException: [Internal server
> error., <Exception on server side:
> 2020-03-25T06:50:58.4801351Z
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
> 2020-03-25T06:50:58.4801938Z at
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$internalSubmitJob$3(Dispatcher.java:336)
> 2020-03-25T06:50:58.4803660Z at
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
> 2020-03-25T06:50:58.4804555Z at
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
> 2020-03-25T06:50:58.4805235Z at
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
> 2020-03-25T06:50:58.4805839Z at
> akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> 2020-03-25T06:50:58.4806515Z at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
> 2020-03-25T06:50:58.4807184Z at
> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 2020-03-25T06:50:58.4807807Z at
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 2020-03-25T06:50:58.4808417Z at
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 2020-03-25T06:50:58.4809055Z at
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2020-03-25T06:50:58.4809783Z Caused by: java.lang.RuntimeException:
> org.apache.flink.runtime.client.JobExecutionException: Could not set up
> JobManager
> 2020-03-25T06:50:58.4810756Z at
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
> 2020-03-25T06:50:58.4811444Z at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> 2020-03-25T06:50:58.4811937Z ... 6 more
> 2020-03-25T06:50:58.4812414Z Caused by:
> org.apache.flink.runtime.client.JobExecutionException: Could not set up
> JobManager
> 2020-03-25T06:50:58.4813330Z at
> org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:152)
> 2020-03-25T06:50:58.4814154Z at
> org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84)
> 2020-03-25T06:50:58.4814846Z at
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:379)
> 2020-03-25T06:50:58.4815622Z at
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
> 2020-03-25T06:50:58.4816074Z ... 7 more
> 2020-03-25T06:50:58.4816924Z Caused by: java.io.IOException: Cannot access
> file system for checkpoint/savepoint path 'file://.'.
> 2020-03-25T06:50:58.4817673Z at
> org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpointPointer(AbstractFsCheckpointStorage.java:233)
> 2020-03-25T06:50:58.4818450Z at
> org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpoint(AbstractFsCheckpointStorage.java:110)
> 2020-03-25T06:50:58.4819276Z at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1312)
> 2020-03-25T06:50:58.4819943Z at
> org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:314)
> 2020-03-25T06:50:58.4820633Z at
> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:247)
> 2020-03-25T06:50:58.4821258Z at
> org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:223)
> 2020-03-25T06:50:58.4821862Z at
> org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:118)
> 2020-03-25T06:50:58.4822505Z at
> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103)
> 2020-03-25T06:50:58.4823115Z at
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:282)
> 2020-03-25T06:50:58.4823665Z at
> org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:270)
> 2020-03-25T06:50:58.4824485Z at
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)
> 2020-03-25T06:50:58.4825597Z at
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)
> 2020-03-25T06:50:58.4826400Z at
> org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:146)
> 2020-03-25T06:50:58.4826919Z ... 10 more
> 2020-03-25T06:50:58.4829018Z Caused by: java.io.IOException: Found local file
> path with authority '.' in path 'file://.'. Hint: Did you forget a slash?
> (correct path would be 'file:///.')
> 2020-03-25T06:50:58.4829875Z at
> org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:441)
> 2020-03-25T06:50:58.4830364Z at
> org.apache.flink.core.fs.FileSystem.get(FileSystem.java:389)
> 2020-03-25T06:50:58.4830807Z at
> org.apache.flink.core.fs.Path.getFileSystem(Path.java:292)
> 2020-03-25T06:50:58.4831408Z at
> org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpointPointer(AbstractFsCheckpointStorage.java:230)
> 2020-03-25T06:50:58.4832021Z ... 22 more
> 2020-03-25T06:50:58.4832151Z
> 2020-03-25T06:50:58.4832356Z End of exception on server side>]
> 2020-03-25T06:50:58.4832720Z at
> org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:390)
> 2020-03-25T06:50:58.4833238Z at
> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:374)
> 2020-03-25T06:50:58.4833884Z at
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:966)
> 2020-03-25T06:50:58.4834376Z at
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940)
> 2020-03-25T06:50:58.4834724Z ... 4 more
> 2020-03-25T06:50:58.5042321Z Resuming from externalized checkpoint job could
> not be started.
> 2020-03-25T06:50:58.5044210Z [FAIL] Test script contains errors.
> 2020-03-25T06:50:58.5052826Z Checking of logs skipped.
> 2020-03-25T06:50:58.5053164Z
> 2020-03-25T06:50:58.5054116Z [FAIL] 'Resuming Externalized Checkpoint (rocks,
> incremental, scale up) end-to-end test' failed after 0 minutes and 27
> seconds! Test exited with exit code 1
> 2020-03-25T06:50:58.5054639Z
> 2020-03-25T06:50:58.8067813Z Stopping taskexecutor daemon (pid: 86888) on
> host fv-az655.
> 2020-03-25T06:50:59.0257270Z Stopping standalonesession daemon (pid: 86603)
> on host fv-az655.
> 2020-03-25T06:50:59.4920994Z
> 2020-03-25T06:50:59.5000014Z ##[error]Bash exited with code '1'.
> 2020-03-25T06:50:59.5015374Z ##[section]Finishing: Run e2e tests
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)