[ 
https://issues.apache.org/jira/browse/TEZ-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4097:
------------------------------
    Description: 
Currently, a fetch failure is reported like this:
{code}
2019-11-05 02:50:35,972 [WARN] [Fetcher_B {Map_4} #1] |shuffle.Fetcher|: Fetch 
Failure from host while connecting: *other_host*, attempt: 
InputAttemptIdentifier [inputIdentifier=1, attemptNumber=0, 
pathComponent=attempt_1572936153637_0005_1_00_000000_0_10003, spillType=0, 
spillId=-1] Informing ShuffleManager:
java.net.SocketTimeoutException: Read timed out
...
{code}

For debugging network/ssl/etc. issues on cluster, it would be convenient to see 
the local host's name in these messages (which is present in the fetcher as 
localHostname property), as in the logs collected by yarn cli, it's not obvious 
for the first sight.

  was:
Currently, a fetch failure is reported like this:
{code}
2019-11-05 02:50:35,972 [WARN] [Fetcher_B {Map_4} #1] |shuffle.Fetcher|: Fetch 
Failure from host while connecting: *other_host*, attempt: 
InputAttemptIdentifier [inputIdentifier=1, attemptNumber=0, 
pathComponent=attempt_1572936153637_0005_1_00_000000_0_10003, spillType=0, 
spillId=-1] Informing ShuffleManager:
java.net.SocketTimeoutException: Read timed out
...
{code}

For debugging network/ssl/etc. issues on cluster, it would be convenient to see 
the local host's name, which is present in the fetcher.


> Report localHostname in Fetcher failure log messages
> ----------------------------------------------------
>
>                 Key: TEZ-4097
>                 URL: https://issues.apache.org/jira/browse/TEZ-4097
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Priority: Minor
>
> Currently, a fetch failure is reported like this:
> {code}
> 2019-11-05 02:50:35,972 [WARN] [Fetcher_B {Map_4} #1] |shuffle.Fetcher|: 
> Fetch Failure from host while connecting: *other_host*, attempt: 
> InputAttemptIdentifier [inputIdentifier=1, attemptNumber=0, 
> pathComponent=attempt_1572936153637_0005_1_00_000000_0_10003, spillType=0, 
> spillId=-1] Informing ShuffleManager:
> java.net.SocketTimeoutException: Read timed out
> ...
> {code}
> For debugging network/ssl/etc. issues on cluster, it would be convenient to 
> see the local host's name in these messages (which is present in the fetcher 
> as localHostname property), as in the logs collected by yarn cli, it's not 
> obvious for the first sight.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to