Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/750#issuecomment-170942707
  
    Good work @mjsax. I think the testing of the REST interface is sufficient, 
also without having a dedicated YARN test case.  
    
    I had some more comments concerning the failure case handling of stop 
calls. The first problem is still the handling of exceptions when calling 
`stop` on the `Invokable` in `Task.stopExecution`. The exception will only be 
logged but no further action is taken. This can lead to a situation where we 
have a corrupted state. I think, we should fail the task in such a situation.
    
    Additionally, the case that a task cannot be found on the `TaskManager` and 
that an exception occurs in `Task.stopExecution` are treated identically by 
sending a `TaskOperationResult` with `success == false` to the `JobManager`. On 
the `JobManager` side this will only be logged. I think the exception case 
should be handled differently. Failing the execution, for example.
    
    And it is still possible that you send a `StopJob` message to the 
`JobManager`, see that the job is in state `RUNNING`, then the `ExecutionGraph` 
switches to `RESTARTING`, and then the stop call is executed on the 
`ExecutionGraph` which won't have an effect. As a user you will receive a 
`StoppingSuccess` message but the job will simply be restarted. I think we 
should also allow stopping jobs when they are in the state `RESTARTING`.
    
    What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to