[
https://issues.apache.org/jira/browse/FLINK-12302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849071#comment-16849071
]
lamber-ken edited comment on FLINK-12302 at 5/27/19 4:42 PM:
-------------------------------------------------------------
[~gjy], from another side, we can analysis this issue only from the code.
When some scene happends and the call the +MiniDispatcher#jobNotFinished+
method, it means the flink job terminate unexpectedly, so it will notify the RM
to kill the yarn application with +ApplicationStatus.UNKNOWN+ state, then the
+UNKNOWN+ state will transfer to +{{UNDEFINED}}+ by
+YarnResourceManager#getYarnStatus.+
But, in hadoop system, the +{{UNDEFINED}}+ means the application has not yet
finished.
*MiniDispatcher#jobNotFinished*
{code:java}
@Override
protected void jobNotFinished(JobID jobId) {
super.jobNotFinished(jobId);
// shut down since we have done our job
jobTerminationFuture.complete(ApplicationStatus.UNKNOWN);
}
{code}
*YarnResourceManager#getYarnStatus*
{code:java}
private FinalApplicationStatus getYarnStatus(ApplicationStatus status) {
if (status == null) {
return FinalApplicationStatus.UNDEFINED;
}
else {
switch (status) {
case SUCCEEDED:
return FinalApplicationStatus.SUCCEEDED;
case FAILED:
return FinalApplicationStatus.FAILED;
case CANCELED:
return FinalApplicationStatus.KILLED;
default:
return FinalApplicationStatus.UNDEFINED;
}
}
}
{code}
**
*Hadoop Application Status*
[FinalApplicationStatus|https://github.com/apache/hadoop-common/blob/42a61a4fbc88303913c4681f0d40ffcc737e70b5/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/FinalApplicationStatus.java#L32]
{code:java}
/**
* Enumeration of various final states of an Application.
*/
@Public
@Stable
public enum FinalApplicationStatus {
/** Undefined state when either the application has not yet finished */
UNDEFINED,
/** Application which finished successfully. */
SUCCEEDED,
/** Application which failed. */
FAILED,
/** Application which was terminated by a user or admin. */
KILLED
}
{code}
was (Author: lamber-ken):
[~gjy], from another side, we can analysis this issue only from the code.
When some scene happends and the call the +MiniDispatcher#jobNotFinished+
method, it means the flink job terminate unexpectedly, so it will notify the RM
to kill the yarn application with +ApplicationStatus.UNKNOWN+ state.
But, in hadoop system, the +{{UNDEFINED}}+ means the application has not yet
finished.
*MiniDispatcher#jobNotFinished*
{code:java}
@Override
protected void jobNotFinished(JobID jobId) {
super.jobNotFinished(jobId);
// shut down since we have done our job
jobTerminationFuture.complete(ApplicationStatus.UNKNOWN);
}
{code}
*Hadoop Application Status*
[FinalApplicationStatus|https://github.com/apache/hadoop-common/blob/42a61a4fbc88303913c4681f0d40ffcc737e70b5/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/FinalApplicationStatus.java#L32]
{code:java}
/**
* Enumeration of various final states of an Application.
*/
@Public
@Stable
public enum FinalApplicationStatus {
/** Undefined state when either the application has not yet finished */
UNDEFINED,
/** Application which finished successfully. */
SUCCEEDED,
/** Application which failed. */
FAILED,
/** Application which was terminated by a user or admin. */
KILLED
}
{code}
> Fixed the wrong finalStatus of yarn application when application finished
> -------------------------------------------------------------------------
>
> Key: FLINK-12302
> URL: https://issues.apache.org/jira/browse/FLINK-12302
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / YARN
> Affects Versions: 1.8.0
> Reporter: lamber-ken
> Assignee: lamber-ken
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.9.0
>
> Attachments: fix-bad-finalStatus.patch, flink-conf.yaml,
> image-2019-04-23-19-56-49-933.png, jobmanager-05-27.log, jobmanager-1.log,
> jobmanager-2.log, screenshot-1.png, screenshot-2.png,
> spslave4.bigdata.ly_23951, spslave5.bigdata.ly_20271, test.jar
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> flink job(flink-1.6.3) failed in per-job yarn cluste mode, the
> resourcemanager of yarn rerun the job.
> when the job failed again, the application while finish, but the finalStatus
> is +UNDEFINED,+ It's better to show state +FAILED+
> !image-2019-04-23-19-56-49-933.png!
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)