[
https://issues.apache.org/jira/browse/FLINK-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chesnay Schepler updated FLINK-4972:
------------------------------------
Description:
The CoordinatorShutdownTest verifies that the CheckpointCoordinator is properly
shutdown when a job has succeeded/failed. For this purpose a job is submitted
to a cluster with(out) TaskManagers, resulting in a successful/failed job. The
ExecutionGraph is then retrieved, from which the CheckpointCoordinator can be
accessed.
This test relies on being able to access the ExecutionGraph for a finished job
even though it is only accessible for a short amount of time: until it was
archived and removed from the currentJobs map in the JM. From that point on you
can only retrieve an ArchivedExecutionGraph, which doesn't contain the
CheckpointCoordinator anymore.
The tests should be changed to block the job execution, retrieve the
ExecutionGraph, resume the job and then verify the test conditions.
was:
The CoordinatorShutdownTest verifies that the CheckpointCoordinator is properly
shutdown when a job has succeeded/failed. For this purpose a job is submitted
to a cluster without TaskManagers, resulting in immediate failure. The
ExecutionGraph is then retrieved, from which the CheckpointCoordinator can be
accessed.
This test relies on being able to access the ExecutionGraph for a finished job
even though it is only accessible for a short amount of time: until it was
archived and removed from the currentJobs map in the JM. From that point on you
can only retrieve an ArchivedExecutionGraph, which doesn't contain the
CheckpointCoordinator anymore.
The tests should be changed to block the job execution, retrieve the
ExecutionGraph, resume the job and then verify the test conditions.
> CoordinatorShutdownTest relies on race condition for success
> ------------------------------------------------------------
>
> Key: FLINK-4972
> URL: https://issues.apache.org/jira/browse/FLINK-4972
> Project: Flink
> Issue Type: Improvement
> Components: Tests
> Affects Versions: 1.2.0
> Reporter: Chesnay Schepler
> Assignee: Chesnay Schepler
> Fix For: 1.2.0
>
>
> The CoordinatorShutdownTest verifies that the CheckpointCoordinator is
> properly shutdown when a job has succeeded/failed. For this purpose a job is
> submitted to a cluster with(out) TaskManagers, resulting in a
> successful/failed job. The ExecutionGraph is then retrieved, from which the
> CheckpointCoordinator can be accessed.
> This test relies on being able to access the ExecutionGraph for a finished
> job even though it is only accessible for a short amount of time: until it
> was archived and removed from the currentJobs map in the JM. From that point
> on you can only retrieve an ArchivedExecutionGraph, which doesn't contain the
> CheckpointCoordinator anymore.
> The tests should be changed to block the job execution, retrieve the
> ExecutionGraph, resume the job and then verify the test conditions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)