[ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520111#comment-14520111
 ] 

Hitesh Shah commented on TEZ-2366:
----------------------------------

Maybe an initial fix would be to just disable this for mini-cluster. 

For a longer term fix: Exposing nodeId or some host identifier requires the 
datamovement event to be updated at the source and then compared against the 
hostIdentifier for the fetcher container. 

The yarn node id should ideally be something that YARN injects into the env - 
for now, we can add it to the env when we launch the container.

> Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
> --------------------------------------------------------------------
>
>                 Key: TEZ-2366
>                 URL: https://issues.apache.org/jira/browse/TEZ-2366
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Daniel Dai
>            Priority: Critical
>         Attachments: TEZ-2366.test.txt, TEZ-2366.wip.1.patch
>
>
> There are around 20 unit tests (out of around 2000) fail intermittently after 
> TEZ-2333. Here is a stack:
> {code}
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> output/attempt_1429899954360_0001_1_01_000000_1_10003/file.out.index in any 
> of the configured local directories
>         at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
>         at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
>         at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> To reproduce that in Pig test, using the following commands:
> svn co http://svn.apache.org/repos/asf/pig/trunk
> ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
> test
> Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
> "true" 
> (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
>  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does 
> not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to