[ https://issues.apache.org/jira/browse/HADOOP-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579017#action_12579017 ]
Koji Noguchi commented on HADOOP-2893: -------------------------------------- One user reported this problem. 2 checksum error happen on the same node. Although checksum failure happen at different time, On a single node, Task task_200803121849_0433_r_000208_0 2008-03-15 05:28:35,738 INFO org.apache.hadoop.fs.FSInputChecker: Found checksum error: org.apache.hadoop.fs.ChecksumException: Checksum error: /tmps/3//mapred-tt/mapred-local/task_200803121849_0433_r_000208_0/map_8850.out at 8437760 and Task 2008-03-15 05:27:11,025 INFO org.apache.hadoop.fs.FSInputChecker: Found checksum error: org.apache.hadoop.fs.ChecksumException: Checksum error: /tmps/3//mapred-tt/mapred-local/task_200803121849_0433_r_000047_0/map_10813.out at 4214784 but merging of these files seem to happened before almost at the same time on the same disk. 2008-03-15 05:18:31,720 INFO org.apache.hadoop.mapred.ReduceTask: task_200803121849_0433_r_000208_0 Merge of the 786 files in InMemoryFileSystem complete. Local file is /tmps/3/mapred-tt/mapred-local/task_200803121849_0433_r_000208_0/map_8850.out and 2008-03-15 05:18:26,157 INFO org.apache.hadoop.mapred.ReduceTask: task_200803121849_0433_r_000047_0 Merge of the 788 files in InMemoryFileSystem complete. Local file is /tmps/3/mapred-tt/mapred-local/task_200803121849_0433_r_000047_0/map_10813.out Could this be related? > checksum exceptions on trunk > ---------------------------- > > Key: HADOOP-2893 > URL: https://issues.apache.org/jira/browse/HADOOP-2893 > Project: Hadoop Core > Issue Type: Bug > Components: fs > Affects Versions: 0.17.0 > Reporter: lohit vijayarenu > > While running jobs like Sort/WordCount on trunk I see few task failures with > ChecksumException > Re-running the tasks on different nodes succeeds. > Here is the stack > {noformat} > Map output lost, rescheduling: > getMapOutput(task_200802251721_0004_m_000237_0,29) failed : > org.apache.hadoop.fs.ChecksumException: Checksum error: > /tmps/4/gs203240-29657-6751459769688273/mapred-tt/mapred-local/task_200802251721_0004_m_000237_0/file.out > at 2085376 > at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:276) > at > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238) > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189) > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:157) > at java.io.DataInputStream.read(DataInputStream.java:132) > at > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2299) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) > at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) > at > org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) > at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) > at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) > at > org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) > at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) > at org.mortbay.http.HttpServer.service(HttpServer.java:954) > at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) > at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) > at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) > at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) > at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) > at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) > {noformat} > another stack > {noformat} > Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: > /tmps/4/gs203240-29657-6751459769688273/mapred-tt/mapred-local/task_200802251721_0004_r_000110_0/map_367.out > at 21884416 > at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:276) > at > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238) > at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176) > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193) > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:157) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:56) > at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90) > at > org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1930) > at > org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2958) > at > org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2716) > at > org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:209) > at > org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:177) > ... 5 more > {noformat} > both with local files -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.