[ 
https://issues.apache.org/jira/browse/FLINK-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006553#comment-17006553
 ] 

Yang Wang edited comment on FLINK-15449 at 1/2/20 3:17 AM:
-----------------------------------------------------------

I think it is an valid user experience improvement. However, if we retain all 
the TaskManagers, it will cost more memory in jobmanager. When the taskmanager 
failover frequently, the jobmanager will OOM. If we add a threshold for 
removing lost taskmanagers, it will not make much differences with now.

 

I want to share how to debug the lost taskmanager now. First, you need to find 
which nodemanager the lost taskmanager is located at. Then use the schema 
"http://

{RM_Address:PORT}

/node/containerlogs/\{container_id}/\{user}" to construct the log url. The log 
url could be used until the application is finished.


was (Author: fly_in_gis):
I think it is an valid user experience improvement. However, if we retain all 
the TaskManagers, it will cost more memory in jobmanager. When the taskmanager 
failover frequently, the jobmanager will OOM. If we add a threshold for 
removing lost taskmanagers, it will not make much differences with now.

 

I want to share how to debug the lost taskmanager now. First, you need to find 
which nodemanager the lost taskmanager is located at. Then use the schema 
\{{http://{RM_Address:PORT}/node/containerlogs/\{container_id}/\{user}}} to 
construct the log url. The log url could be used until the application is 
finished.

> Retain lost task managers on Flink UI
> -------------------------------------
>
>                 Key: FLINK-15449
>                 URL: https://issues.apache.org/jira/browse/FLINK-15449
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>    Affects Versions: 1.9.1
>            Reporter: Victor Wong
>            Priority: Major
>
> With Flink on Yarn, sometimes our TaskManager was killed because of OOM or 
> heartbeat timeout or whatever reasons, it's not convenient to check out the 
> logs of the lost TaskManger.
> Can we retain the lost task managers on Flink UI, and provide the log service 
> through Yarn (we can redirect the URL of log/stdout to Yarn container 
> log/stdout)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to