[
https://issues.apache.org/jira/browse/FLINK-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006553#comment-17006553
]
Yang Wang commented on FLINK-15449:
-----------------------------------
I think it is an valid user experience improvement. However, if we retain all
the TaskManagers, it will cost more memory in jobmanager. When the taskmanager
failover frequently, the jobmanager will OOM. If we add a threshold for
removing lost taskmanagers, it will not make much differences with now.
I want to share how to debug the lost taskmanager now. First, you need to find
which nodemanager the lost taskmanager is located at. Then use the schema
\{{http://{RM_Address:PORT}/node/containerlogs/\{container_id}/\{user}}} to
construct the log url. The log url could be used until the application is
finished.
> Retain lost task managers on Flink UI
> -------------------------------------
>
> Key: FLINK-15449
> URL: https://issues.apache.org/jira/browse/FLINK-15449
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / YARN
> Affects Versions: 1.9.1
> Reporter: Victor Wong
> Priority: Major
>
> With Flink on Yarn, sometimes our TaskManager was killed because of OOM or
> heartbeat timeout or whatever reasons, it's not convenient to check out the
> logs of the lost TaskManger.
> Can we retain the lost task managers on Flink UI, and provide the log service
> through Yarn (we can redirect the URL of log/stdout to Yarn container
> log/stdout)?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)