[
https://issues.apache.org/jira/browse/FLINK-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041478#comment-15041478
]
ASF GitHub Bot commented on FLINK-2111:
---------------------------------------
Github user tillrohrmann commented on the pull request:
https://github.com/apache/flink/pull/750#issuecomment-161949525
When fixing the `JobManagerTest` I noticed the following. When the job was
stopped when it was still in the state `SCHEDULED` or `DEPLOYING`, then one
received a `StoppingSuccess`. The problem was that the stop was not executed
and the job later switched to `RUNNING`.
The same can be observed if the job is in state `RESTARTING`. Stopping a
restarting job does nothing even though you receive a `StoppingSuccess`
message. The job will later be redeployed.
As a user I would expect that the job is immediately stopped or at least at
the next possible moment (e.g. when it's deployed). Or I would expect that the
system tells me that the stopping is at the moment not possible.
Similar is the question, what happens if only a subset of all sources is
deployed and in the state `RUNNING`. This would mean that the undeployed
sources won't get noticed about the stopping signal and, thus, be normally
deployed.
Furthermore, what happens if the `stop` method of the `SourceFunction`
throws an unchecked exception? If I'm not mistaken, then this will only get
logged. But shouldn't the task be cancelled in such a situation because the
state cannot be guaranteed to be consistent anymore?
The case that a `Task` is not `Stoppable` and that a `Task` cannot be found
on the `TaskManager` are treated by the `Execution` identically. Both cases
cause a `TaskOperationResult(executionID, false, message)` to be sent back to
the `Execution`. There it will be logged that the stopping call "did not find
the task". I think it would be good to differentiate the two cases.
> Add "stop" signal to cleanly shutdown streaming jobs
> ----------------------------------------------------
>
> Key: FLINK-2111
> URL: https://issues.apache.org/jira/browse/FLINK-2111
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Runtime, JobManager, Local Runtime,
> Streaming, TaskManager, Webfrontend
> Reporter: Matthias J. Sax
> Assignee: Matthias J. Sax
> Priority: Minor
>
> Currently, streaming jobs can only be stopped using "cancel" command, what is
> a "hard" stop with no clean shutdown.
> The new introduced "stop" signal, will only affect streaming source tasks
> such that the sources can stop emitting data and shutdown cleanly, resulting
> in a clean shutdown of the whole streaming job.
> This feature is a pre-requirment for
> https://issues.apache.org/jira/browse/FLINK-1929
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)