[ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529333#comment-14529333
 ] 

Siddharth Seth commented on TEZ-2366:
-------------------------------------

Couple of comments.
- The Fetchers should not be reading the ShuffleMetaData. It'll be better to 
pass the port in as a parameter similar to the way the host is setup. There's 
multiple fetchers created, and reading and parsing the ShuffleMetadata per 
Input is unnecessary. (Also we've tried keeping the InputContext out of the 
Fetcher).
- The check for a null env - is this required ? LocalContainerLauncher sets 
this up correctly with port 0. So it should never be null - except maybe in 
unit tests.

Rest looks good.

> Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
> --------------------------------------------------------------------
>
>                 Key: TEZ-2366
>                 URL: https://issues.apache.org/jira/browse/TEZ-2366
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Daniel Dai
>            Assignee: Prakash Ramachandran
>            Priority: Critical
>         Attachments: TEZ-2366.1.patch, TEZ-2366.2.patch, TEZ-2366.test.txt, 
> TEZ-2366.wip.1.patch
>
>
> There are around 20 unit tests (out of around 2000) fail intermittently after 
> TEZ-2333. Here is a stack:
> {code}
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> output/attempt_1429899954360_0001_1_01_000000_1_10003/file.out.index in any 
> of the configured local directories
>         at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
>         at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
>         at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> To reproduce that in Pig test, using the following commands:
> svn co http://svn.apache.org/repos/asf/pig/trunk
> ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
> test
> Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
> "true" 
> (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
>  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does 
> not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to