[ 
https://issues.apache.org/jira/browse/TEZ-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048241#comment-17048241
 ] 

Ashutosh Chauhan commented on TEZ-4097:
---------------------------------------

+1

> Report localHostname in Fetcher and FetcherOrderedGrouped failure log messages
> ------------------------------------------------------------------------------
>
>                 Key: TEZ-4097
>                 URL: https://issues.apache.org/jira/browse/TEZ-4097
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Minor
>         Attachments: TEZ-4097.01.patch
>
>
> Currently, a fetch failure is reported like this:
> {code}
> 2019-11-05 02:50:35,972 [WARN] [Fetcher_B {Map_4} #1] |shuffle.Fetcher|: 
> Fetch Failure from host while connecting: other_host, attempt: 
> InputAttemptIdentifier [inputIdentifier=1, attemptNumber=0, 
> pathComponent=attempt_1572936153637_0005_1_00_000000_0_10003, spillType=0, 
> spillId=-1] Informing ShuffleManager:
> java.net.SocketTimeoutException: Read timed out
> ...
> {code}
> For debugging network/ssl/etc. issues on cluster, it would be convenient to 
> see the local host's name in these messages (which is present in the fetcher 
> as localHostname property), as in the logs collected by yarn cli, it's not 
> obvious for the first sight.
> The same applies to FetcherOrderedGrouped, which reports something like:
> {code}
> 2019-11-05 03:13:11,046 [WARN] [Fetcher_O {Map_1} #0] 
> |orderedgrouped.FetcherOrderedGrouped|: Failed to verify reply after 
> connecting to other_host:13562 with 1 inputs pending
> javax.net.ssl.SSLHandshakeException: 
> sun.security.validator.ValidatorException: PKIX path building failed: 
> sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
> valid certification path to requested target
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to