[jira] [Commented] (FLINK-29639) Add ResourceId in TransportException for debugging

Weijie Guo (Jira) Mon, 31 Oct 2022 00:58:07 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-29639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626451#comment-17626451
 ]


Weijie Guo commented on FLINK-29639:
------------------------------------

[~Jiangang] Thank you for your proposal. I think this is very useful as I have 
indeed encountered the same case that it is difficult to find out which pod has 
a problem from the error report directly.

I have only one question about this:  Where are you going to store the 
resourceId of the upstream TM? Is it in the ConnectionID? I think whether it is 
placed here or not, we'd better test that there is no negative impact on the 
task deployment, because this will increase the size of the shuffle descriptor.

> Add ResourceId in TransportException for debugging 
> ---------------------------------------------------
>
>                 Key: FLINK-29639
>                 URL: https://issues.apache.org/jira/browse/FLINK-29639
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: Liu
>            Assignee: Liu
>            Priority: Major
>
> When the taskmanager is lost, only the host and port are shown in the 
> exception. It is hard to find the exactly taskmanger by resourceId. Add 
> ResourceId info will help a lot in debugging the job.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-29639) Add ResourceId in TransportException for debugging

Reply via email to