[
https://issues.apache.org/jira/browse/TEZ-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-2635:
----------------------------------
Attachment: TEZ-2635.1.patch
tez2635.tar.gz
Was able to reproduce this with small job (500 source tasks, 1 dest task)
locally (i.e dest task pulling 500 attempts at once). Attached the sample job
here. If url length exceeds 4KB, it throws up Http 400 as server can't handle
it. Attaching simple fix which limits the number of attempts that can be
downloaded in ordered/unordered cases.
[~sseth], [~gopalv] - Plz review when you find time. No test cases attached.
> Limit number of attempts being downloaded in unordered fetch
> ------------------------------------------------------------
>
> Key: TEZ-2635
> URL: https://issues.apache.org/jira/browse/TEZ-2635
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Attachments: TEZ-2635.1.patch, tez2635.tar.gz
>
>
> {noformat}
> 2015-07-22 23:39:14,221 WARN [Fetcher [Map_3] #4] shuffle.Fetcher: Fetch
> Failure from host while connecting: machine123, attempt:
> InputAttemptIdentifier [inputIdentifier=InputIdentifier [inputIndex=12],
> attemptNumber=0,
> pathComponent=attempt_1437098194051_0178_2_02_000012_0_10003_0,
> fetchTypeInfo=INCREMENTAL_UPDATE, spillEventId=0] Informing ShuffleManager:
> java.io.IOException: Server returned HTTP response code: 400 for URL:
> http://machine123:13562/mapOutput?job=job_1437098194051_0178&reduce=279&map=attempt_1437098194051_0178_2_02_000012_0_10003_0,attempt_1437098194051_0178_2_02_000012_0_10003_1,attempt_1437098194051_0178_2_02_000012_0_10003_2,attempt_1437098194051_0178_2_02_000012_0_10003_3,attempt_1437098194051_0178_2_02_000031_0_10006_0,attempt_1437098194051_0178_2_02_000031_0_10006_1,attempt_1437098194051_0178_2_02_000031_0_10006_2,attempt_1437098194051_0178_2_02_000031_0_10006_3,attempt_1437098194051_0178_2_02_000031_0_10006_4,attempt_1437098194051_0178_2_02_000050_0_10009_0,attempt_1437098194051_0178_2_02_000050_0_10009_1,attempt_1437098194051_0178_2_02_000050_0_10009_2,attempt_1437098194051_0178_2_02_000050_0_10009_3,attempt_1437098194051_0178_2_02_000069_0_10012_0,attempt_1437098194051_0178_2_02_000088_0_10033_0,attempt_1437098194051_0178_2_02_000107_0_10033_0,attempt_1437098194051_0178_2_02_000126_0_10006_0,attempt_1437098194051_0178_2_02_000069_0_10012_1,attempt_1437098194051_0178_2_02_000088_0_10033_1,attempt_1437098194051_0178_2_02_000145_0_10006_0,attempt_1437098194051_0178_2_02_000107_0_10033_1,attempt_1437098194051_0178_2_02_000126_0_10006_1,attempt_1437098194051_0178_2_02_000069_0_10012_2,attempt_1437098194051_0178_2_02_000069_0_10012_3,attempt_1437098194051_0178_2_02_000145_0_10006_1,attempt_1437098194051_0178_2_02_000088_0_10033_2,attempt_1437098194051_0178_2_02_000107_0_10033_2,attempt_1437098194051_0178_2_02_000126_0_10006_2,attempt_1437098194051_0178_2_02_000164_0_10030_0,attempt_1437098194051_0178_2_02_000183_0_10006_0,attempt_1437098194051_0178_2_02_000107_0_10033_3,attempt_1437098194051_0178_2_02_000145_0_10006_2,attempt_1437098194051_0178_2_02_000088_0_10033_3,attempt_1437098194051_0178_2_02_000088_0_10033_4,attempt_1437098194051_0178_2_02_000202_0_10015_0,attempt_1437098194051_0178_2_02_000145_0_10006_3,attempt_1437098194051_0178_2_02_000126_0_10006_3,attempt_1437098194051_0178_2_02_000126_0_10006_4,attempt_1437098194051_0178_2_02_000164_0_10030_1,attempt_1437098194051_0178_2_02_000183_0_10006_1,attempt_1437098194051_0178_2_02_000202_0_10015_1,attempt_1437098194051_0178_2_02_000183_0_10006_2,attempt_1437098194051_0178_2_02_000164_0_10030_2,attempt_1437098194051_0178_2_02_000164_0_10030_3,attempt_1437098194051_0178_2_02_000183_0_10006_3,attempt_1437098194051_0178_2_02_000202_0_10015_2,attempt_1437098194051_0178_2_02_000202_0_10015_3,attempt_1437098194051_0178_2_02_000133_0_10036_0,attempt_1437098194051_0178_2_02_000096_0_10012_0,attempt_1437098194051_0178_2_02_000114_0_10009_0,attempt_1437098194051_0178_2_02_000095_0_10009_0,attempt_1437098194051_0178_2_02_000153_0_10041_0,attempt_1437098194051_0178_2_02_000143_0_10036_0,attempt_1437098194051_0178_2_02_000190_0_10015_0,attempt_1437098194051_0178_2_02_000181_0_10042_0,attempt_1437098194051_0178_2_02_000133_0_10036_1,attempt_1437098194051_0178_2_02_000143_0_10036_1,attempt_1437098194051_0178_2_02_000153_0_10041_1,attempt_1437098194051_0178_2_02_000190_0_10015_1,attempt_1437098194051_0178_2_02_000209_0_10018_0,attempt_1437098194051_0178_2_02_000095_0_10009_1,attempt_1437098194051_0178_2_02_000114_0_10009_1,attempt_1437098194051_0178_2_02_000096_0_10012_1,attempt_1437098194051_0178_2_02_000181_0_10042_1,attempt_1437098194051_0178_2_02_000133_0_10036_2,attempt_1437098194051_0178_2_02_000153_0_10041_2,attempt_1437098194051_0178_2_02_000143_0_10036_2,attempt_1437098194051_0178_2_02_000114_0_10009_2,attempt_1437098194051_0178_2_02_000190_0_10015_2,attempt_1437098194051_0178_2_02_000133_0_10036_3,attempt_1437098194051_0178_2_02_000095_0_10009_2,attempt_1437098194051_0178_2_02_000096_0_10012_2,attempt_1437098194051_0178_2_02_000209_0_10018_1,attempt_1437098194051_0178_2_02_000181_0_10042_2,attempt_1437098194051_0178_2_02_000153_0_10041_3,attempt_1437098194051_0178_2_02_000095_0_10009_3,attempt_1437098194051_0178_2_02_000096_0_10012_3,attempt_1437098194051_0178_2_02_000114_0_10009_3,attempt_1437098194051_0178_2_02_000190_0_10015_3,attempt_1437098194051_0178_2_02_000143_0_10036_3,attempt_1437098194051_0178_2_02_000190_0_10015_4,attempt_1437098194051_0178_2_02_000143_0_10036_4,attempt_1437098194051_0178_2_02_000181_0_10042_3,attempt_1437098194051_0178_2_02_000153_0_10041_4,attempt_1437098194051_0178_2_02_000181_0_10042_4,attempt_1437098194051_0178_2_02_000209_0_10018_2,attempt_1437098194051_0178_2_02_000209_0_10018_3&keepAlive=true
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1839)
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
> at
> org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:248)
> at
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:441)
> at
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:470)
> at
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:403)
> at
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:199)
> at
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:71)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> tez.runtime.shuffle.fetch.max.task.output.at.once is provided only for
> ordered fetch, which defaults to 20. But for unordered case, this is not
> honored.
> [~gopalv] got this issue when executing "select p.p_partkey, li.l_suppkey
> from (select distinct l_partkey as p_partkey from lineitem) p join lineitem
> li on p.p_partkey = li.l_partkey where li.l_linenumber = 1 and li.l_orderkey
> in (select l_orderkey from lineitem where l_shipmode = 'AIR') limit 2" @ 10
> TB scale
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)