[ 
https://issues.apache.org/jira/browse/TEZ-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648348#comment-14648348
 ] 

Bikas Saha commented on TEZ-2635:
---------------------------------

Minor. The value of includedMaps would be wrong for any code that tries to use 
that value as the actual value of include maps in the fetch. This will be true 
only in the case of overflow (not the common case). So a test for such a use 
case may not catch the overflow. Would be good to stop incrementing the value 
if we stop including more maps.
{code}+      // Check if max threshold is met
+      if (includedMaps++ >= maxTaskOutputAtOnce) {
+        inputIter.remove();
+        inputHost.addKnownInput(input); //add to inputHost
+      }{code}

> Limit number of attempts being downloaded in unordered fetch
> ------------------------------------------------------------
>
>                 Key: TEZ-2635
>                 URL: https://issues.apache.org/jira/browse/TEZ-2635
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2635.1.patch, TEZ-2635.2.patch, tez2635.tar.gz
>
>
> {noformat}
> 2015-07-22 23:39:14,221 WARN [Fetcher [Map_3] #4] shuffle.Fetcher: Fetch 
> Failure from host while connecting: machine123, attempt: 
> InputAttemptIdentifier [inputIdentifier=InputIdentifier [inputIndex=12], 
> attemptNumber=0, 
> pathComponent=attempt_1437098194051_0178_2_02_000012_0_10003_0, 
> fetchTypeInfo=INCREMENTAL_UPDATE, spillEventId=0] Informing ShuffleManager: 
> java.io.IOException: Server returned HTTP response code: 400 for URL: 
> http://machine123:13562/mapOutput?job=job_1437098194051_0178&reduce=279&map=attempt_1437098194051_0178_2_02_000012_0_10003_0,attempt_1437098194051_0178_2_02_000012_0_10003_1,attempt_1437098194051_0178_2_02_000012_0_10003_2,attempt_1437098194051_0178_2_02_000012_0_10003_3,attempt_1437098194051_0178_2_02_000031_0_10006_0,attempt_1437098194051_0178_2_02_000031_0_10006_1,attempt_1437098194051_0178_2_02_000031_0_10006_2,attempt_1437098194051_0178_2_02_000031_0_10006_3,attempt_1437098194051_0178_2_02_000031_0_10006_4,attempt_1437098194051_0178_2_02_000050_0_10009_0,attempt_1437098194051_0178_2_02_000050_0_10009_1,attempt_1437098194051_0178_2_02_000050_0_10009_2,attempt_1437098194051_0178_2_02_000050_0_10009_3,attempt_1437098194051_0178_2_02_000069_0_10012_0,attempt_1437098194051_0178_2_02_000088_0_10033_0,attempt_1437098194051_0178_2_02_000107_0_10033_0,attempt_1437098194051_0178_2_02_000126_0_10006_0,attempt_1437098194051_0178_2_02_000069_0_10012_1,attempt_1437098194051_0178_2_02_000088_0_10033_1,attempt_1437098194051_0178_2_02_000145_0_10006_0,attempt_1437098194051_0178_2_02_000107_0_10033_1,attempt_1437098194051_0178_2_02_000126_0_10006_1,attempt_1437098194051_0178_2_02_000069_0_10012_2,attempt_1437098194051_0178_2_02_000069_0_10012_3,attempt_1437098194051_0178_2_02_000145_0_10006_1,attempt_1437098194051_0178_2_02_000088_0_10033_2,attempt_1437098194051_0178_2_02_000107_0_10033_2,attempt_1437098194051_0178_2_02_000126_0_10006_2,attempt_1437098194051_0178_2_02_000164_0_10030_0,attempt_1437098194051_0178_2_02_000183_0_10006_0,attempt_1437098194051_0178_2_02_000107_0_10033_3,attempt_1437098194051_0178_2_02_000145_0_10006_2,attempt_1437098194051_0178_2_02_000088_0_10033_3,attempt_1437098194051_0178_2_02_000088_0_10033_4,attempt_1437098194051_0178_2_02_000202_0_10015_0,attempt_1437098194051_0178_2_02_000145_0_10006_3,attempt_1437098194051_0178_2_02_000126_0_10006_3,attempt_1437098194051_0178_2_02_000126_0_10006_4,attempt_1437098194051_0178_2_02_000164_0_10030_1,attempt_1437098194051_0178_2_02_000183_0_10006_1,attempt_1437098194051_0178_2_02_000202_0_10015_1,attempt_1437098194051_0178_2_02_000183_0_10006_2,attempt_1437098194051_0178_2_02_000164_0_10030_2,attempt_1437098194051_0178_2_02_000164_0_10030_3,attempt_1437098194051_0178_2_02_000183_0_10006_3,attempt_1437098194051_0178_2_02_000202_0_10015_2,attempt_1437098194051_0178_2_02_000202_0_10015_3,attempt_1437098194051_0178_2_02_000133_0_10036_0,attempt_1437098194051_0178_2_02_000096_0_10012_0,attempt_1437098194051_0178_2_02_000114_0_10009_0,attempt_1437098194051_0178_2_02_000095_0_10009_0,attempt_1437098194051_0178_2_02_000153_0_10041_0,attempt_1437098194051_0178_2_02_000143_0_10036_0,attempt_1437098194051_0178_2_02_000190_0_10015_0,attempt_1437098194051_0178_2_02_000181_0_10042_0,attempt_1437098194051_0178_2_02_000133_0_10036_1,attempt_1437098194051_0178_2_02_000143_0_10036_1,attempt_1437098194051_0178_2_02_000153_0_10041_1,attempt_1437098194051_0178_2_02_000190_0_10015_1,attempt_1437098194051_0178_2_02_000209_0_10018_0,attempt_1437098194051_0178_2_02_000095_0_10009_1,attempt_1437098194051_0178_2_02_000114_0_10009_1,attempt_1437098194051_0178_2_02_000096_0_10012_1,attempt_1437098194051_0178_2_02_000181_0_10042_1,attempt_1437098194051_0178_2_02_000133_0_10036_2,attempt_1437098194051_0178_2_02_000153_0_10041_2,attempt_1437098194051_0178_2_02_000143_0_10036_2,attempt_1437098194051_0178_2_02_000114_0_10009_2,attempt_1437098194051_0178_2_02_000190_0_10015_2,attempt_1437098194051_0178_2_02_000133_0_10036_3,attempt_1437098194051_0178_2_02_000095_0_10009_2,attempt_1437098194051_0178_2_02_000096_0_10012_2,attempt_1437098194051_0178_2_02_000209_0_10018_1,attempt_1437098194051_0178_2_02_000181_0_10042_2,attempt_1437098194051_0178_2_02_000153_0_10041_3,attempt_1437098194051_0178_2_02_000095_0_10009_3,attempt_1437098194051_0178_2_02_000096_0_10012_3,attempt_1437098194051_0178_2_02_000114_0_10009_3,attempt_1437098194051_0178_2_02_000190_0_10015_3,attempt_1437098194051_0178_2_02_000143_0_10036_3,attempt_1437098194051_0178_2_02_000190_0_10015_4,attempt_1437098194051_0178_2_02_000143_0_10036_4,attempt_1437098194051_0178_2_02_000181_0_10042_3,attempt_1437098194051_0178_2_02_000153_0_10041_4,attempt_1437098194051_0178_2_02_000181_0_10042_4,attempt_1437098194051_0178_2_02_000209_0_10018_2,attempt_1437098194051_0178_2_02_000209_0_10018_3&keepAlive=true
>       at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1839)
>       at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
>       at 
> org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:248)
>       at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:441)
>       at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:470)
>       at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:403)
>       at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:199)
>       at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:71)
>       at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> tez.runtime.shuffle.fetch.max.task.output.at.once is provided only for 
> ordered fetch, which defaults to 20. But for unordered case, this is not 
> honored.
> [~gopalv] got this issue when executing "select p.p_partkey, li.l_suppkey 
> from (select distinct l_partkey as p_partkey from lineitem) p join lineitem 
> li on p.p_partkey = li.l_partkey where li.l_linenumber = 1 and li.l_orderkey 
> in (select l_orderkey from lineitem where l_shipmode = 'AIR') limit 2" @ 10 
> TB scale



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to