[ 
https://issues.apache.org/jira/browse/TEZ-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated TEZ-1808:
--------------------------------
    Attachment: TEZ-1808.2.patch

[~sseth], thanks for your review. Updating a patch based on integer counter 
approach.

Does this approach preserve the semantics of fault tolerance?

> Job can fail since name of intermediate files can be too long in specific 
> situation
> -----------------------------------------------------------------------------------
>
>                 Key: TEZ-1808
>                 URL: https://issues.apache.org/jira/browse/TEZ-1808
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Tsuyoshi OZAWA
>            Assignee: Tsuyoshi OZAWA
>         Attachments: TEZ-1808-wip.1.patch, TEZ-1808.1.patch, TEZ-1808.2.patch
>
>
> I ran Hive 0.14 on Tez 0.5.2 and master with MemToMemMerger disabled - this 
> configuration change is the diff between TEZ-1807 and this JIRA.  Data size 
> is 100GB texts generated by RandomTextWriter.
> {code}
> create external table randomText100GB(
>   text string
> ) location 'hdfs:///user/ozawa/randomText100GB';
> CREATE TABLE wordcount AS
> SELECT word, count(1) AS count
> FROM (SELECT 
> EXPLODE(SPLIT(LCASE(REGEXP_REPLACE(text,'[\\p{Punct},\\p{Cntrl}]','')),' '))
> AS word FROM randomText100GB) words
> GROUP BY word;
> {code}
> As a result, an exception is thrown:
> {quote}
> --------------------------------------------------------------------------------
>         VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> --------------------------------------------------------------------------------
> Map 1 .........       KILLED    115        110        0        5       0      
>  5
> Reducer 2             FAILED      3          0        0        3       1      
>  2
> --------------------------------------------------------------------------------
> VERTICES: 00/02  [========================>>--] 93%   ELAPSED TIME: 110.95 s  
>  
> --------------------------------------------------------------------------------
> Status: Failed
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1417036912823_0073_1_01, 
> diagnostics=[Task failed, taskId=task_1417036912823_0073_1_01_000000, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: 
> exceptionThrown=org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  error in shuffle in DiskToDiskMerger [Map_1]
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:338)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:319)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: 
> /hadoop1/tmp/nm-local-dir/usercache/ozawa/appcache/application_1417036912823_0073/attempt_1417036912823_0073_1_01_000000_0_10026_spill_215.out.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged
>  (File name too long)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:211)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:207)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:270)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:257)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>         at 
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.<init>(IFile.java:129)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager$OnDiskMerger.merge(MergeManager.java:702)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeThread.run(MergeThread.java:89)
> , errorMessage=Shuffle Runner 
> Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  error in shuffle in DiskToDiskMerger [Map_1]
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:338)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:319)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: 
> /hadoop1/tmp/nm-local-dir/usercache/ozawa/appcache/application_1417036912823_0073/attempt_1417036912823_0073_1_01_000000_0_10026_spill_215.out.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged
>  (File name too long)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:211)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:207)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:270)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:257)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>         at 
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.<init>(IFile.java:129)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager$OnDiskMerger.merge(MergeManager.java:702)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeThread.run(MergeThread.java:89)
> ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
> vertex_1417036912823_0073_1_01 [Reducer 2] killed/failed due to:null]
> Vertex killed, vertexName=Map 1, vertexId=vertex_1417036912823_0073_1_00, 
> diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as 
> other vertex failed. failedTasks:0, Vertex vertex_1417036912823_0073_1_00 
> [Map 1] killed/failed due to:null]
> DAG failed due to vertex failure. failedVertices:1 killedVertices:1
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask
> {quote}
> The log message of this line looks strange:
> {quote}
> Caused by: java.io.FileNotFoundException: 
> /hadoop1/tmp/nm-local-dir/usercache/ozawa/appcache/application_1417036912823_0073/attempt_1417036912823_0073_1_01_000000_0_10026_spill_215.out.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged
>  (File name too long)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to