[ 
https://issues.apache.org/jira/browse/FLINK-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-4972:
------------------------------------
    Description: 
The CoordinatorShutdownTest verifies that the CheckpointCoordinator is properly 
shutdown when a job has succeeded/failed. For this purpose a job is submitted 
to a cluster with(out) TaskManagers, resulting in a successful/failed job. The 
ExecutionGraph is then retrieved, from which the CheckpointCoordinator can be 
accessed.

This test relies on being able to access the ExecutionGraph for a finished job 
even though it is only accessible for a short amount of time: until it was 
archived and removed from the currentJobs map in the JM. From that point on you 
can only retrieve an ArchivedExecutionGraph, which doesn't contain the 
CheckpointCoordinator anymore.

The tests should be changed to block the job execution, retrieve the 
ExecutionGraph, resume the job and then verify the test conditions.

  was:
The CoordinatorShutdownTest verifies that the CheckpointCoordinator is properly 
shutdown when a job has succeeded/failed. For this purpose a job is submitted 
to a cluster without TaskManagers, resulting in immediate failure. The 
ExecutionGraph is then retrieved, from which the CheckpointCoordinator can be 
accessed.

This test relies on being able to access the ExecutionGraph for a finished job 
even though it is only accessible for a short amount of time: until it was 
archived and removed from the currentJobs map in the JM. From that point on you 
can only retrieve an ArchivedExecutionGraph, which doesn't contain the 
CheckpointCoordinator anymore.

The tests should be changed to block the job execution, retrieve the 
ExecutionGraph, resume the job and then verify the test conditions.


> CoordinatorShutdownTest relies on race condition for success
> ------------------------------------------------------------
>
>                 Key: FLINK-4972
>                 URL: https://issues.apache.org/jira/browse/FLINK-4972
>             Project: Flink
>          Issue Type: Improvement
>          Components: Tests
>    Affects Versions: 1.2.0
>            Reporter: Chesnay Schepler
>            Assignee: Chesnay Schepler
>             Fix For: 1.2.0
>
>
> The CoordinatorShutdownTest verifies that the CheckpointCoordinator is 
> properly shutdown when a job has succeeded/failed. For this purpose a job is 
> submitted to a cluster with(out) TaskManagers, resulting in a 
> successful/failed job. The ExecutionGraph is then retrieved, from which the 
> CheckpointCoordinator can be accessed.
> This test relies on being able to access the ExecutionGraph for a finished 
> job even though it is only accessible for a short amount of time: until it 
> was archived and removed from the currentJobs map in the JM. From that point 
> on you can only retrieve an ArchivedExecutionGraph, which doesn't contain the 
> CheckpointCoordinator anymore.
> The tests should be changed to block the job execution, retrieve the 
> ExecutionGraph, resume the job and then verify the test conditions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to