[
https://issues.apache.org/jira/browse/TEZ-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048608#comment-17048608
]
László Bodor commented on TEZ-4097:
-----------------------------------
thanks for the review [~ashutoshc], pushed to master
> Report localHostname in Fetcher and FetcherOrderedGrouped failure log messages
> ------------------------------------------------------------------------------
>
> Key: TEZ-4097
> URL: https://issues.apache.org/jira/browse/TEZ-4097
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Minor
> Fix For: 0.10.0
>
> Attachments: TEZ-4097.01.patch
>
>
> Currently, a fetch failure is reported like this:
> {code}
> 2019-11-05 02:50:35,972 [WARN] [Fetcher_B {Map_4} #1] |shuffle.Fetcher|:
> Fetch Failure from host while connecting: other_host, attempt:
> InputAttemptIdentifier [inputIdentifier=1, attemptNumber=0,
> pathComponent=attempt_1572936153637_0005_1_00_000000_0_10003, spillType=0,
> spillId=-1] Informing ShuffleManager:
> java.net.SocketTimeoutException: Read timed out
> ...
> {code}
> For debugging network/ssl/etc. issues on cluster, it would be convenient to
> see the local host's name in these messages (which is present in the fetcher
> as localHostname property), as in the logs collected by yarn cli, it's not
> obvious for the first sight.
> The same applies to FetcherOrderedGrouped, which reports something like:
> {code}
> 2019-11-05 03:13:11,046 [WARN] [Fetcher_O {Map_1} #0]
> |orderedgrouped.FetcherOrderedGrouped|: Failed to verify reply after
> connecting to other_host:13562 with 1 inputs pending
> javax.net.ssl.SSLHandshakeException:
> sun.security.validator.ValidatorException: PKIX path building failed:
> sun.security.provider.certpath.SunCertPathBuilderException: unable to find
> valid certification path to requested target
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)