[jira] [Commented] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

Hitesh Shah (JIRA) Mon, 27 Apr 2015 09:53:45 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514429#comment-14514429
 ]


Hitesh Shah commented on TEZ-2366:
----------------------------------

No - not correct to rely on the default. Also, a better change is to verify 
whether it is from the same nodemanager ( i.e. matching nodeId ). 

In any case, a simpler short term fix is to disable local fetch for minicluster 
but we should look at fixing this properly in the long term for a situation 
where multiple NMs run on the same host for both local and shared fetch 
optimizations. 



> Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
> --------------------------------------------------------------------
>
>                 Key: TEZ-2366
>                 URL: https://issues.apache.org/jira/browse/TEZ-2366
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Daniel Dai
>            Priority: Critical
>         Attachments: TEZ-2366.test.txt, TEZ-2366.wip.1.patch
>
>
> There are around 20 unit tests (out of around 2000) fail intermittently after 
> TEZ-2333. Here is a stack:
> {code}
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> output/attempt_1429899954360_0001_1_01_000000_1_10003/file.out.index in any 
> of the configured local directories
>         at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
>         at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
>         at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
>         at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> To reproduce that in Pig test, using the following commands:
> svn co http://svn.apache.org/repos/asf/pig/trunk
> ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
> test
> Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
> "true" 
> (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
>  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does 
> not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

Reply via email to