[ https://issues.apache.org/jira/browse/TEZ-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ayush Saxena resolved TEZ-4357. ------------------------------- Fix Version/s: 0.10.4 Resolution: Fixed > Report url to logs in case of fetcher connection failure > -------------------------------------------------------- > > Key: TEZ-4357 > URL: https://issues.apache.org/jira/browse/TEZ-4357 > Project: Apache Tez > Issue Type: Bug > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > Fix For: 0.10.4 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently, when Fetcher and FetcherOrderedGrouped fail on getInputStream, > like: > {code} > 2021-12-03 08:32:04,634 [WARN] [Fetcher_O {expDataFile} #11] > |orderedgrouped.FetcherOrderedGrouped|: Failed to verify reply after > connecting from hwc7213-6.hwc7213.root.hwx.site to > hwc7213-7.hwc7213.root.hwx.site:13562 with 1 inputs pending > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498) > at > org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupConnection(FetcherOrderedGrouped.java:362) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:265) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:184) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:196) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:59) > {code} > they don't report the URL, which is important to understand the failure...I'm > investigating a shuffle failure, looking at INFO level logs, and in case of > failure, the url itself is not printed, but I can see in the ShuffleHandler > logs that the request is not a valid ssl request, which makes me think that > fetcher simply works with invalid settings...if I saw the base url at least, > I could make sure it was trying to connect in a secure way (--> protocol: > https) > logging isSecureShuffle could also be useful, but url contains every needed > info we need in case of a failure -- This message was sent by Atlassian Jira (v8.20.10#820010)