[ 
https://issues.apache.org/jira/browse/FLINK-23174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381937#comment-17381937
 ] 

Yuan Mei edited comment on FLINK-23174 at 7/16/21, 9:51 AM:
------------------------------------------------------------

Hey [~Bo Cui]

After reading the PR, I think it aims for two sets of log enhancements. Please 
correct me if I misunderstand:
 # Add stack trace of thread dump in some places.
 Would you mind sharing a bit of what extra info is exposed with these stack 
traces? The calling stack seems to be able to be tracked through function 
calls; Also, I think the most important info should already be included in the 
{{throwable error}} stack trace". The {{throwable error}} stack will be printed 
when the error is handled. If it is not, we probably should understand why and 
fix it there.
 # log remote channel address in {{notifyAllChannelsOfErrorAndClose}} (I do not 
think other added logging in this method provides any more msg, reason same as 
point 1)
 To my best knowledge, if an error is due to a remote error (sender side or 
network), it will be wrapped as a {{RemoteTransportException}} which will 
include remote socket address; otherwise, the error can be any type of 
Throwable,
 which I think should be enough in most cases.

But if you think we should log remote socket address in all cases, we can wrap 
to different types of Exceptions (like ChannelException, and including the 
remote address and local address in it). That's the simplest way I can think of 
without violating and polluting existing logs.

Please let me know what do you think.


was (Author: ym):
Hey [~Bo Cui]

After reading the PR, I think it aims for two sets of log enhancements. Please 
correct me if I misunderstand:
 # Add stack trace of thread dump in some places.
Would you mind sharing a bit of what extra info is exposed with these stack 
traces? The calling stack seems to be able to be tracked through function 
calls; Also, I think the most important info should already be included in the 
{{throwable error}} stack trace". The {{throwable error}} stack will be printed 
when the error is handled. If it is not, we probably should understand why and 
fix it there.

 # log remote channel address in {{notifyAllChannelsOfErrorAndClose}} (I do not 
think other added logging in this method provides any more msg, reason same as 
point 1)
To my best knowledge, if an error is due to a remote error (sender side or 
network), it will be wrapped as a {{RemoteTransportException}} which will 
include remote socket address; otherwise, the error can be any type of 
Throwable,
which I think should be enough in most cases.

But if you think we should log remote socket address in all cases, we can wrap 
to different types of Exceptions (like ChannelException, and including the 
remote address and local address in it). That's the simplest way I can think of 
without violating and polluting existing logs.

Please let me know what do you think.

> Log improvement in Task throws Error
> ------------------------------------
>
>                 Key: FLINK-23174
>                 URL: https://issues.apache.org/jira/browse/FLINK-23174
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination, Runtime / Network
>    Affects Versions: 1.13.1
>            Reporter: Bo Cui
>            Assignee: Bo Cui
>            Priority: Major
>              Labels: pull-request-available
>
> we met some channels close due to network jitter and task&job fail.
> we can only see which remote channel causes the task/job failure. 
> but we can not know more details, such as which channel close, task stack...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to