Prasanth Jayachandran created TEZ-4174:
------------------------------------------

             Summary: [Kubernetes] Fetcher should connection failure on 
SocketException
                 Key: TEZ-4174
                 URL: https://issues.apache.org/jira/browse/TEZ-4174
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.10.0
            Reporter: Prasanth Jayachandran
            Assignee: Prasanth Jayachandran


Fetcher considers connection failure only when http.connect throws exception. 
In kubernetes environment, where there can be intermediate proxies, 
getInputStream from http connection can throw connection reset error (5xx). 
These errors should be considered as connection failures as well.
{code:java}
2020-05-08 17:03:54.080  WARN [Fetcher_B {Map_3} #3] shuffle.Fetcher: Fetch 
Failure while connecting from 10.117.155.27 to: 10.117.154.115:25551, attempt: 
InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, 
pathComponent=attempt_1588982534035_0000_1_00_000000_0_10030, spillType=0, 
spillId=-1] Informing ShuffleManager:
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:210)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:706)
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593)
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
        at 
org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260)
        at 
org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:530)
        at 
org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:563)
        at 
org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:487)
        at 
org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:285)
        at 
org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to