[ 
https://issues.apache.org/jira/browse/FLINK-30505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652087#comment-17652087
 ] 

Xintong Song commented on FLINK-30505:
--------------------------------------

I don't see how the proposed change makes a difference. The exception in the 
2nd screenshot is not the _real reason_ of the TM failure. It practically said 
the same thing as the exception in the 1st screenshot, that the TM is no longer 
reachable. To understand the real reason, you need to check the TM/K8s logs 
anyway.

> Close the connection between TM and JM when task executor failed
> ----------------------------------------------------------------
>
>                 Key: FLINK-30505
>                 URL: https://issues.apache.org/jira/browse/FLINK-30505
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Task
>    Affects Versions: 1.16.0
>            Reporter: Yongming Zhang
>            Priority: Major
>             Fix For: 1.17.0
>
>
> When resource manager detects a task executor has failed, it will close 
> connection with task executor. At this time,jobs running on this tm will fail 
> for other reasons(no longger reachable or heartbeat timeout).
> !https://intranetproxy.alipay.com/skylark/lark/0/2022/png/336411/1672047809511-a4b8b5d9-f11f-483c-a113-b42290a33250.png|width=1160,id=uc24b1166!
> If close the connection between task executor and job master when resource 
> manager detects a task executor has failed,the real reason for task executor 
> failure will appear in "Root Exception".This will make it easier for users to 
> find problems.
> !https://intranetproxy.alipay.com/skylark/lark/0/2022/png/336411/1672048733572-2b5b7be4-087d-46ae-9c8d-6ad5a1344019.png|width=1141,id=u947d8c4e!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to