Rajesh Balamohan created TEZ-2635:
-------------------------------------
Summary: Limit number of attempts being downloaded in unordered
fetch
Key: TEZ-2635
URL: https://issues.apache.org/jira/browse/TEZ-2635
Project: Apache Tez
Issue Type: Bug
Reporter: Rajesh Balamohan
{noformat}
2015-07-22 23:39:14,221 WARN [Fetcher [Map_3] #4] shuffle.Fetcher: Fetch
Failure from host while connecting: machine123, attempt: InputAttemptIdentifier
[inputIdentifier=InputIdentifier [inputIndex=12], attemptNumber=0,
pathComponent=attempt_1437098194051_0178_2_02_000012_0_10003_0,
fetchTypeInfo=INCREMENTAL_UPDATE, spillEventId=0] Informing ShuffleManager:
java.io.IOException: Server returned HTTP response code: 400 for URL:
http://machine123:13562/mapOutput?job=job_1437098194051_0178&reduce=279&map=attempt_1437098194051_0178_2_02_000012_0_10003_0,attempt_1437098194051_0178_2_02_000012_0_10003_1,attempt_1437098194051_0178_2_02_000012_0_10003_2,attempt_1437098194051_0178_2_02_000012_0_10003_3,attempt_1437098194051_0178_2_02_000031_0_10006_0,attempt_1437098194051_0178_2_02_000031_0_10006_1,attempt_1437098194051_0178_2_02_000031_0_10006_2,attempt_1437098194051_0178_2_02_000031_0_10006_3,attempt_1437098194051_0178_2_02_000031_0_10006_4,attempt_1437098194051_0178_2_02_000050_0_10009_0,attempt_1437098194051_0178_2_02_000050_0_10009_1,attempt_1437098194051_0178_2_02_000050_0_10009_2,attempt_1437098194051_0178_2_02_000050_0_10009_3,attempt_1437098194051_0178_2_02_000069_0_10012_0,attempt_1437098194051_0178_2_02_000088_0_10033_0,attempt_1437098194051_0178_2_02_000107_0_10033_0,attempt_1437098194051_0178_2_02_000126_0_10006_0,attempt_1437098194051_0178_2_02_000069_0_10012_1,attempt_1437098194051_0178_2_02_000088_0_10033_1,attempt_1437098194051_0178_2_02_000145_0_10006_0,attempt_1437098194051_0178_2_02_000107_0_10033_1,attempt_1437098194051_0178_2_02_000126_0_10006_1,attempt_1437098194051_0178_2_02_000069_0_10012_2,attempt_1437098194051_0178_2_02_000069_0_10012_3,attempt_1437098194051_0178_2_02_000145_0_10006_1,attempt_1437098194051_0178_2_02_000088_0_10033_2,attempt_1437098194051_0178_2_02_000107_0_10033_2,attempt_1437098194051_0178_2_02_000126_0_10006_2,attempt_1437098194051_0178_2_02_000164_0_10030_0,attempt_1437098194051_0178_2_02_000183_0_10006_0,attempt_1437098194051_0178_2_02_000107_0_10033_3,attempt_1437098194051_0178_2_02_000145_0_10006_2,attempt_1437098194051_0178_2_02_000088_0_10033_3,attempt_1437098194051_0178_2_02_000088_0_10033_4,attempt_1437098194051_0178_2_02_000202_0_10015_0,attempt_1437098194051_0178_2_02_000145_0_10006_3,attempt_1437098194051_0178_2_02_000126_0_10006_3,attempt_1437098194051_0178_2_02_000126_0_10006_4,attempt_1437098194051_0178_2_02_000164_0_10030_1,attempt_1437098194051_0178_2_02_000183_0_10006_1,attempt_1437098194051_0178_2_02_000202_0_10015_1,attempt_1437098194051_0178_2_02_000183_0_10006_2,attempt_1437098194051_0178_2_02_000164_0_10030_2,attempt_1437098194051_0178_2_02_000164_0_10030_3,attempt_1437098194051_0178_2_02_000183_0_10006_3,attempt_1437098194051_0178_2_02_000202_0_10015_2,attempt_1437098194051_0178_2_02_000202_0_10015_3,attempt_1437098194051_0178_2_02_000133_0_10036_0,attempt_1437098194051_0178_2_02_000096_0_10012_0,attempt_1437098194051_0178_2_02_000114_0_10009_0,attempt_1437098194051_0178_2_02_000095_0_10009_0,attempt_1437098194051_0178_2_02_000153_0_10041_0,attempt_1437098194051_0178_2_02_000143_0_10036_0,attempt_1437098194051_0178_2_02_000190_0_10015_0,attempt_1437098194051_0178_2_02_000181_0_10042_0,attempt_1437098194051_0178_2_02_000133_0_10036_1,attempt_1437098194051_0178_2_02_000143_0_10036_1,attempt_1437098194051_0178_2_02_000153_0_10041_1,attempt_1437098194051_0178_2_02_000190_0_10015_1,attempt_1437098194051_0178_2_02_000209_0_10018_0,attempt_1437098194051_0178_2_02_000095_0_10009_1,attempt_1437098194051_0178_2_02_000114_0_10009_1,attempt_1437098194051_0178_2_02_000096_0_10012_1,attempt_1437098194051_0178_2_02_000181_0_10042_1,attempt_1437098194051_0178_2_02_000133_0_10036_2,attempt_1437098194051_0178_2_02_000153_0_10041_2,attempt_1437098194051_0178_2_02_000143_0_10036_2,attempt_1437098194051_0178_2_02_000114_0_10009_2,attempt_1437098194051_0178_2_02_000190_0_10015_2,attempt_1437098194051_0178_2_02_000133_0_10036_3,attempt_1437098194051_0178_2_02_000095_0_10009_2,attempt_1437098194051_0178_2_02_000096_0_10012_2,attempt_1437098194051_0178_2_02_000209_0_10018_1,attempt_1437098194051_0178_2_02_000181_0_10042_2,attempt_1437098194051_0178_2_02_000153_0_10041_3,attempt_1437098194051_0178_2_02_000095_0_10009_3,attempt_1437098194051_0178_2_02_000096_0_10012_3,attempt_1437098194051_0178_2_02_000114_0_10009_3,attempt_1437098194051_0178_2_02_000190_0_10015_3,attempt_1437098194051_0178_2_02_000143_0_10036_3,attempt_1437098194051_0178_2_02_000190_0_10015_4,attempt_1437098194051_0178_2_02_000143_0_10036_4,attempt_1437098194051_0178_2_02_000181_0_10042_3,attempt_1437098194051_0178_2_02_000153_0_10041_4,attempt_1437098194051_0178_2_02_000181_0_10042_4,attempt_1437098194051_0178_2_02_000209_0_10018_2,attempt_1437098194051_0178_2_02_000209_0_10018_3&keepAlive=true
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1839)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
at
org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:248)
at
org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:441)
at
org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:470)
at
org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:403)
at
org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:199)
at
org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:71)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}
tez.runtime.shuffle.fetch.max.task.output.at.once is provided only for ordered
fetch, which defaults to 20. But for unordered case, this is not honored.
hive query reference: "select p.p_partkey, li.l_suppkey from (select distinct
l_partkey as p_partkey from lineitem) p join lineitem li on p.p_partkey =
li.l_partkey where li.l_linenumber = 1 and li.l_orderkey in (select l_orderkey
from lineitem where l_shipmode = 'AIR') limit 2" @ 10 TB scale
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)