[jira] [Commented] (FLINK-10743) Use 0 processExitCode for ApplicationStatus.CANCELED

ASF GitHub Bot (JIRA) Mon, 10 Dec 2018 07:34:30 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714917#comment-16714917
 ]


ASF GitHub Bot commented on FLINK-10743:
----------------------------------------

tillrohrmann commented on a change in pull request #7004: [FLINK-10743] 
[runtime] Use 0 processExitCode for ApplicationStatus.CANCELED
URL: https://github.com/apache/flink/pull/7004#discussion_r240254518
 
 

 ##########
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java
 ##########
 @@ -32,7 +32,7 @@
        FAILED(1443),
 
 Review comment:
   Technically speaking, I think that all `ApplicationStatus` which don't 
indicate a failure of the Flink system should return `0`. `FAILED` for example 
is currently used if the job terminally failed but it does not mean that the 
Flink cluster itself has failed. I would only return a non-zero return value if 
Flink itself has failed (e.g. due to an initialization error). But this is 
probably part of a follow up issue.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Use 0 processExitCode for ApplicationStatus.CANCELED
> ----------------------------------------------------
>
>                 Key: FLINK-10743
>                 URL: https://issues.apache.org/jira/browse/FLINK-10743
>             Project: Flink
>          Issue Type: Bug
>          Components: Cluster Management, Kubernetes, Mesos, YARN
>    Affects Versions: 1.6.3, 1.7.0
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.8.0
>
>
> {{org.apache.flink.runtime.clusterframework.ApplicationStatus}} is used to 
> map {{org.apache.flink.runtime.jobgraph.JobStatus}} to a process exit code.
> We currently map {{ApplicationStatus.CANCELED}} to a non-zero exit code 
> ({{1444}}). Since cancellation is a user-triggered operation I would consider 
> this to be a successful exit and map it to exit code {{0}}.
> Our current behavior results in applications running via the 
> {{StandaloneJobClusterEntryPoint}} and Kubernetes pods as documented in 
> [flink-container|https://github.com/apache/flink/tree/master/flink-container/kubernetes]
>  to be immediately restarted when cancelled. This only leaves the option of 
> killing the respective job cluster master container.
> The {{ApplicationStatus}} is also used in the YARN and Mesos clients, but I'm 
> not familiar with that part of the code base and can't asses how changing the 
> exit code would affect these clients. A quick usage scan for 
> {{ApplicationStatus.CANCELED}} did not surface any problematic usages though.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10743) Use 0 processExitCode for ApplicationStatus.CANCELED

Reply via email to