[ 
https://issues.apache.org/jira/browse/HADOOP-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579017#action_12579017
 ] 

Koji Noguchi commented on HADOOP-2893:
--------------------------------------

One user reported this problem.

2 checksum error happen on the same node.  Although checksum failure happen at 
different time,

On a single node, 

Task task_200803121849_0433_r_000208_0
2008-03-15 05:28:35,738 INFO org.apache.hadoop.fs.FSInputChecker: Found 
checksum error: org.apache.hadoop.fs.ChecksumException: Checksum error: 
/tmps/3//mapred-tt/mapred-local/task_200803121849_0433_r_000208_0/map_8850.out 
at 8437760

and 

Task 
2008-03-15 05:27:11,025 INFO org.apache.hadoop.fs.FSInputChecker: Found 
checksum error: org.apache.hadoop.fs.ChecksumException: Checksum error: 
/tmps/3//mapred-tt/mapred-local/task_200803121849_0433_r_000047_0/map_10813.out 
at 4214784

but merging of these files seem to happened before almost at the same time on 
the same disk.

2008-03-15 05:18:31,720 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200803121849_0433_r_000208_0 Merge of the 786 files in InMemoryFileSystem 
complete. Local file is 
/tmps/3/mapred-tt/mapred-local/task_200803121849_0433_r_000208_0/map_8850.out

and 

2008-03-15 05:18:26,157 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200803121849_0433_r_000047_0 Merge of the 788 files in InMemoryFileSystem 
complete. Local file is 
/tmps/3/mapred-tt/mapred-local/task_200803121849_0433_r_000047_0/map_10813.out

Could this be related?



> checksum exceptions on trunk
> ----------------------------
>
>                 Key: HADOOP-2893
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2893
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.17.0
>            Reporter: lohit vijayarenu
>
> While running jobs like Sort/WordCount on trunk I see few task failures with 
> ChecksumException
> Re-running the tasks on different nodes succeeds. 
> Here is the stack
> {noformat}
> Map output lost, rescheduling: 
> getMapOutput(task_200802251721_0004_m_000237_0,29) failed :
> org.apache.hadoop.fs.ChecksumException: Checksum error: 
> /tmps/4/gs203240-29657-6751459769688273/mapred-tt/mapred-local/task_200802251721_0004_m_000237_0/file.out
>  at 2085376
>   at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:276)
>   at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
>   at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
>   at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:157)
>   at java.io.DataInputStream.read(DataInputStream.java:132)
>   at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2299)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
>   at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
>   at 
> org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
>   at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
>   at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
>   at 
> org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
>   at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
>   at org.mortbay.http.HttpServer.service(HttpServer.java:954)
>   at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
>   at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
>   at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
>   at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
>   at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
>   at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
> {noformat}
> another stack
> {noformat}
> Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: 
> /tmps/4/gs203240-29657-6751459769688273/mapred-tt/mapred-local/task_200802251721_0004_r_000110_0/map_367.out
>  at 21884416
>   at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:276)
>   at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
>   at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176)
>   at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193)
>   at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:157)
>   at java.io.DataInputStream.readFully(DataInputStream.java:178)
>   at 
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:56)
>   at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1930)
>   at 
> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2958)
>   at 
> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2716)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:209)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:177)
>   ... 5 more
> {noformat}
> both with local files

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to