[ https://issues.apache.org/jira/browse/TEZ-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234768#comment-14234768 ]
Tsuyoshi OZAWA commented on TEZ-1808: ------------------------------------- Thanks a lot! > Job can fail since name of intermediate files can be too long in specific > situation > ----------------------------------------------------------------------------------- > > Key: TEZ-1808 > URL: https://issues.apache.org/jira/browse/TEZ-1808 > Project: Apache Tez > Issue Type: Bug > Reporter: Tsuyoshi OZAWA > Assignee: Tsuyoshi OZAWA > Fix For: 0.5.3 > > Attachments: TEZ-1808-wip.1.patch, TEZ-1808.1.patch, > TEZ-1808.2.patch, TEZ-1808.3.patch > > > I ran Hive 0.14 on Tez 0.5.2 and master with MemToMemMerger disabled - this > configuration change is the diff between TEZ-1807 and this JIRA. Data size > is 100GB texts generated by RandomTextWriter. > {code} > create external table randomText100GB( > text string > ) location 'hdfs:///user/ozawa/randomText100GB'; > CREATE TABLE wordcount AS > SELECT word, count(1) AS count > FROM (SELECT > EXPLODE(SPLIT(LCASE(REGEXP_REPLACE(text,'[\\p{Punct},\\p{Cntrl}]','')),' ')) > AS word FROM randomText100GB) words > GROUP BY word; > {code} > As a result, an exception is thrown: > {quote} > -------------------------------------------------------------------------------- > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED > KILLED > -------------------------------------------------------------------------------- > Map 1 ......... KILLED 115 110 0 5 0 > 5 > Reducer 2 FAILED 3 0 0 3 1 > 2 > -------------------------------------------------------------------------------- > VERTICES: 00/02 [========================>>--] 93% ELAPSED TIME: 110.95 s > > -------------------------------------------------------------------------------- > Status: Failed > Vertex failed, vertexName=Reducer 2, vertexId=vertex_1417036912823_0073_1_01, > diagnostics=[Task failed, taskId=task_1417036912823_0073_1_01_000000, > diagnostics=[TaskAttempt 0 failed, info=[Error: > exceptionThrown=org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > error in shuffle in DiskToDiskMerger [Map_1] > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:338) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:319) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.FileNotFoundException: > /hadoop1/tmp/nm-local-dir/usercache/ozawa/appcache/application_1417036912823_0073/attempt_1417036912823_0073_1_01_000000_0_10026_spill_215.out.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged > (File name too long) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.<init>(FileOutputStream.java:221) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:211) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:207) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:270) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:257) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773) > at > org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.<init>(IFile.java:129) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager$OnDiskMerger.merge(MergeManager.java:702) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeThread.run(MergeThread.java:89) > , errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > error in shuffle in DiskToDiskMerger [Map_1] > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:338) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:319) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.FileNotFoundException: > /hadoop1/tmp/nm-local-dir/usercache/ozawa/appcache/application_1417036912823_0073/attempt_1417036912823_0073_1_01_000000_0_10026_spill_215.out.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged > (File name too long) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.<init>(FileOutputStream.java:221) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:211) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:207) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:270) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:257) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773) > at > org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.<init>(IFile.java:129) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager$OnDiskMerger.merge(MergeManager.java:702) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeThread.run(MergeThread.java:89) > ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex > vertex_1417036912823_0073_1_01 [Reducer 2] killed/failed due to:null] > Vertex killed, vertexName=Map 1, vertexId=vertex_1417036912823_0073_1_00, > diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as > other vertex failed. failedTasks:0, Vertex vertex_1417036912823_0073_1_00 > [Map 1] killed/failed due to:null] > DAG failed due to vertex failure. failedVertices:1 killedVertices:1 > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask > {quote} > The log message of this line looks strange: > {quote} > Caused by: java.io.FileNotFoundException: > /hadoop1/tmp/nm-local-dir/usercache/ozawa/appcache/application_1417036912823_0073/attempt_1417036912823_0073_1_01_000000_0_10026_spill_215.out.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged > (File name too long) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)