[
https://issues.apache.org/jira/browse/FLINK-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458752#comment-16458752
]
ASF GitHub Bot commented on FLINK-8900:
---------------------------------------
GitHub user StephanEwen opened a pull request:
https://github.com/apache/flink/pull/5944
[FLINK-8900] [yarn] Set correct application status when job is finished
## What is the purpose of the change
When finite Flink applications (batch jobs) are sent to YARN in the
detached mode, the final status is currently always the same, because the job's
result is not passed to the logic that initiates the application shutdown.
This PR forwards the final job status via a future that is used to register
the shutdown handlers.
## Brief change log
- Introduce the `JobTerminationFuture` in the `MiniDispatcher`
-
## Verifying this change
```
bin/flink run -m yarn-cluster -yjm 2048 -ytm 2048
./examples/streaming/WordCount.jar
```
- Run the batch job as described above on YARN to succeed, check that the
final application status is successful.
- Run the batch job with a parameter to a non existing input file on
YARN, check that the final application status is failed.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / **no)**
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (yes / **no**)
- The serializers: (yes / **no** / don't know)
- The runtime per-record code paths (performance sensitive): (yes /
**no** / don't know)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: (**yes** / no / don't know)
- The S3 file system connector: (yes / **no** / don't know)
## Documentation
- Does this pull request introduce a new feature? (yes / **no**)
- If yes, how is the feature documented? (**not applicable** / docs /
JavaDocs / not documented)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink yarn_fix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5944.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5944
----
commit f4130c64420e2ad2acb680869c9b84aa5dbcc7c7
Author: Stephan Ewen <sewen@...>
Date: 2018-04-30T07:55:50Z
[hotfix] [tests] Update log4j-test.properties
Brings the logging definition in sync with other projects.
Updates the classname for the suppressed logger in Netty to account for the
new
shading model introduced in Flink 1.4.
commit 5fcc9aca392cbcd5dfa474b0a286868b44836f23
Author: Stephan Ewen <sewen@...>
Date: 2018-04-27T16:57:27Z
[FLINK-8900] [yarn] Set correct application status when job is finished
----
> YARN FinalStatus always shows as KILLED with Flip-6
> ---------------------------------------------------
>
> Key: FLINK-8900
> URL: https://issues.apache.org/jira/browse/FLINK-8900
> Project: Flink
> Issue Type: Bug
> Components: YARN
> Affects Versions: 1.5.0, 1.6.0
> Reporter: Nico Kruber
> Assignee: Gary Yao
> Priority: Blocker
> Labels: flip-6
> Fix For: 1.5.0
>
>
> Whenever I run a simple simple word count like this one on YARN with Flip-6
> enabled,
> {code}
> ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c
> org.apache.flink.streaming.examples.wordcount.WordCount
> ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING
> {code}
> it will show up as {{KILLED}} in the {{State}} and {{FinalStatus}} columns
> even though the program ran successfully like this one (irrespective of
> FLINK-8899 occurring or not):
> {code}
> 2018-03-08 16:48:39,049 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Streaming
> WordCount (11a794d2f5dc2955d8015625ec300c20) switched from state RUNNING to
> FINISHED.
> 2018-03-08 16:48:39,050 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping
> checkpoint coordinator for job 11a794d2f5dc2955d8015625ec300c20
> 2018-03-08 16:48:39,050 INFO
> org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore -
> Shutting down
> 2018-03-08 16:48:39,078 INFO
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job
> 11a794d2f5dc2955d8015625ec300c20 reached globally terminal state FINISHED.
> 2018-03-08 16:48:39,151 INFO
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register
> TaskManager e58efd886429e8f080815ea74ddfa734 at the SlotManager.
> 2018-03-08 16:48:39,221 INFO org.apache.flink.runtime.jobmaster.JobMaster
> - Stopping the JobMaster for job Streaming
> WordCount(11a794d2f5dc2955d8015625ec300c20).
> 2018-03-08 16:48:39,270 INFO org.apache.flink.runtime.jobmaster.JobMaster
> - Close ResourceManager connection
> 43f725adaee14987d3ff99380701f52f: JobManager is shutting down..
> 2018-03-08 16:48:39,270 INFO org.apache.flink.yarn.YarnResourceManager
> - Disconnect job manager
> [email protected]://[email protected]:34281/user/jobmanager_0
> for job 11a794d2f5dc2955d8015625ec300c20 from the resource manager.
> 2018-03-08 16:48:39,349 INFO
> org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending
> SlotPool.
> 2018-03-08 16:48:39,349 INFO
> org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping
> SlotPool.
> 2018-03-08 16:48:39,349 INFO
> org.apache.flink.runtime.jobmaster.JobManagerRunner -
> JobManagerRunner already shutdown.
> 2018-03-08 16:48:39,775 INFO
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register
> TaskManager 4e1fb6c8f95685e24b6a4cb4b71ffb92 at the SlotManager.
> 2018-03-08 16:48:39,846 INFO
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register
> TaskManager b5bce0bdfa7fbb0f4a0905cc3ee1c233 at the SlotManager.
> 2018-03-08 16:48:39,876 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED
> SIGNAL 15: SIGTERM. Shutting down as requested.
> 2018-03-08 16:48:39,910 INFO
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register
> TaskManager a35b0690fdc6ec38bbcbe18a965000fd at the SlotManager.
> 2018-03-08 16:48:39,942 INFO
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register
> TaskManager 5175cabe428bea19230ac056ff2a17bb at the SlotManager.
> 2018-03-08 16:48:39,974 INFO org.apache.flink.runtime.blob.BlobServer
> - Stopped BLOB server at 0.0.0.0:46511
> 2018-03-08 16:48:39,975 INFO
> org.apache.flink.runtime.blob.TransientBlobCache - Shutting down
> BLOB cache
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)