[
https://issues.apache.org/jira/browse/TEZ-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639352#comment-14639352
]
Siddharth Seth commented on TEZ-2635:
-------------------------------------
{code}if (includedMaps++ >= maxTaskOutputAtOnce) {{code}
I think this will end up ignoring the removes done as a result of
completedInputs and obsoletedInputs. That should be factored into the counting.
Also, I'm not sure how the host actually makes it back into the pendingQueue.
The loop to decide which host to fetch relies upon picking up hosts from
'pendingHosts' - which is not populated back when adding the inputs back to the
host.
> Limit number of attempts being downloaded in unordered fetch
> ------------------------------------------------------------
>
> Key: TEZ-2635
> URL: https://issues.apache.org/jira/browse/TEZ-2635
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2635.1.patch, tez2635.tar.gz
>
>
> {noformat}
> 2015-07-22 23:39:14,221 WARN [Fetcher [Map_3] #4] shuffle.Fetcher: Fetch
> Failure from host while connecting: machine123, attempt:
> InputAttemptIdentifier [inputIdentifier=InputIdentifier [inputIndex=12],
> attemptNumber=0,
> pathComponent=attempt_1437098194051_0178_2_02_000012_0_10003_0,
> fetchTypeInfo=INCREMENTAL_UPDATE, spillEventId=0] Informing ShuffleManager:
> java.io.IOException: Server returned HTTP response code: 400 for URL:
> http://machine123:13562/mapOutput?job=job_1437098194051_0178&reduce=279&map=attempt_1437098194051_0178_2_02_000012_0_10003_0,attempt_1437098194051_0178_2_02_000012_0_10003_1,attempt_1437098194051_0178_2_02_000012_0_10003_2,attempt_1437098194051_0178_2_02_000012_0_10003_3,attempt_1437098194051_0178_2_02_000031_0_10006_0,attempt_1437098194051_0178_2_02_000031_0_10006_1,attempt_1437098194051_0178_2_02_000031_0_10006_2,attempt_1437098194051_0178_2_02_000031_0_10006_3,attempt_1437098194051_0178_2_02_000031_0_10006_4,attempt_1437098194051_0178_2_02_000050_0_10009_0,attempt_1437098194051_0178_2_02_000050_0_10009_1,attempt_1437098194051_0178_2_02_000050_0_10009_2,attempt_1437098194051_0178_2_02_000050_0_10009_3,attempt_1437098194051_0178_2_02_000069_0_10012_0,attempt_1437098194051_0178_2_02_000088_0_10033_0,attempt_1437098194051_0178_2_02_000107_0_10033_0,attempt_1437098194051_0178_2_02_000126_0_10006_0,attempt_1437098194051_0178_2_02_000069_0_10012_1,attempt_1437098194051_0178_2_02_000088_0_10033_1,attempt_1437098194051_0178_2_02_000145_0_10006_0,attempt_1437098194051_0178_2_02_000107_0_10033_1,attempt_1437098194051_0178_2_02_000126_0_10006_1,attempt_1437098194051_0178_2_02_000069_0_10012_2,attempt_1437098194051_0178_2_02_000069_0_10012_3,attempt_1437098194051_0178_2_02_000145_0_10006_1,attempt_1437098194051_0178_2_02_000088_0_10033_2,attempt_1437098194051_0178_2_02_000107_0_10033_2,attempt_1437098194051_0178_2_02_000126_0_10006_2,attempt_1437098194051_0178_2_02_000164_0_10030_0,attempt_1437098194051_0178_2_02_000183_0_10006_0,attempt_1437098194051_0178_2_02_000107_0_10033_3,attempt_1437098194051_0178_2_02_000145_0_10006_2,attempt_1437098194051_0178_2_02_000088_0_10033_3,attempt_1437098194051_0178_2_02_000088_0_10033_4,attempt_1437098194051_0178_2_02_000202_0_10015_0,attempt_1437098194051_0178_2_02_000145_0_10006_3,attempt_1437098194051_0178_2_02_000126_0_10006_3,attempt_1437098194051_0178_2_02_000126_0_10006_4,attempt_1437098194051_0178_2_02_000164_0_10030_1,attempt_1437098194051_0178_2_02_000183_0_10006_1,attempt_1437098194051_0178_2_02_000202_0_10015_1,attempt_1437098194051_0178_2_02_000183_0_10006_2,attempt_1437098194051_0178_2_02_000164_0_10030_2,attempt_1437098194051_0178_2_02_000164_0_10030_3,attempt_1437098194051_0178_2_02_000183_0_10006_3,attempt_1437098194051_0178_2_02_000202_0_10015_2,attempt_1437098194051_0178_2_02_000202_0_10015_3,attempt_1437098194051_0178_2_02_000133_0_10036_0,attempt_1437098194051_0178_2_02_000096_0_10012_0,attempt_1437098194051_0178_2_02_000114_0_10009_0,attempt_1437098194051_0178_2_02_000095_0_10009_0,attempt_1437098194051_0178_2_02_000153_0_10041_0,attempt_1437098194051_0178_2_02_000143_0_10036_0,attempt_1437098194051_0178_2_02_000190_0_10015_0,attempt_1437098194051_0178_2_02_000181_0_10042_0,attempt_1437098194051_0178_2_02_000133_0_10036_1,attempt_1437098194051_0178_2_02_000143_0_10036_1,attempt_1437098194051_0178_2_02_000153_0_10041_1,attempt_1437098194051_0178_2_02_000190_0_10015_1,attempt_1437098194051_0178_2_02_000209_0_10018_0,attempt_1437098194051_0178_2_02_000095_0_10009_1,attempt_1437098194051_0178_2_02_000114_0_10009_1,attempt_1437098194051_0178_2_02_000096_0_10012_1,attempt_1437098194051_0178_2_02_000181_0_10042_1,attempt_1437098194051_0178_2_02_000133_0_10036_2,attempt_1437098194051_0178_2_02_000153_0_10041_2,attempt_1437098194051_0178_2_02_000143_0_10036_2,attempt_1437098194051_0178_2_02_000114_0_10009_2,attempt_1437098194051_0178_2_02_000190_0_10015_2,attempt_1437098194051_0178_2_02_000133_0_10036_3,attempt_1437098194051_0178_2_02_000095_0_10009_2,attempt_1437098194051_0178_2_02_000096_0_10012_2,attempt_1437098194051_0178_2_02_000209_0_10018_1,attempt_1437098194051_0178_2_02_000181_0_10042_2,attempt_1437098194051_0178_2_02_000153_0_10041_3,attempt_1437098194051_0178_2_02_000095_0_10009_3,attempt_1437098194051_0178_2_02_000096_0_10012_3,attempt_1437098194051_0178_2_02_000114_0_10009_3,attempt_1437098194051_0178_2_02_000190_0_10015_3,attempt_1437098194051_0178_2_02_000143_0_10036_3,attempt_1437098194051_0178_2_02_000190_0_10015_4,attempt_1437098194051_0178_2_02_000143_0_10036_4,attempt_1437098194051_0178_2_02_000181_0_10042_3,attempt_1437098194051_0178_2_02_000153_0_10041_4,attempt_1437098194051_0178_2_02_000181_0_10042_4,attempt_1437098194051_0178_2_02_000209_0_10018_2,attempt_1437098194051_0178_2_02_000209_0_10018_3&keepAlive=true
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1839)
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
> at
> org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:248)
> at
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:441)
> at
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:470)
> at
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:403)
> at
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:199)
> at
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:71)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> tez.runtime.shuffle.fetch.max.task.output.at.once is provided only for
> ordered fetch, which defaults to 20. But for unordered case, this is not
> honored.
> [~gopalv] got this issue when executing "select p.p_partkey, li.l_suppkey
> from (select distinct l_partkey as p_partkey from lineitem) p join lineitem
> li on p.p_partkey = li.l_partkey where li.l_linenumber = 1 and li.l_orderkey
> in (select l_orderkey from lineitem where l_shipmode = 'AIR') limit 2" @ 10
> TB scale
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)