[ 
https://issues.apache.org/jira/browse/HADOOP-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500087
 ] 

Julian Neil commented on HADOOP-573:
------------------------------------


No. Sorry, should have been clearer. Replaced it with non-ECC memory. You still 
think this may be the cause?  Can you explain why you think this would fix the 
problem.  

Reading the other similar issues on checksum errors, it appears that files (or 
their checksum files) at various stages of the map/reduce processing are 
becoming corrupted when written to disk.  There are reports of errors in map 
output, during sorting, and in reduce output.

It smacks of a tricky threading issue. Both because hadoop is fairly complex in 
its use of threads, and because the bug is intermittent.



> Checksum error during sorting in reducer
> ----------------------------------------
>
>                 Key: HADOOP-573
>                 URL: https://issues.apache.org/jira/browse/HADOOP-573
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Owen O'Malley
>
> Many reduce tasks got killed due to checksum error. The strange thing is that 
> the file was generated by the sort function, and was on a local disk. Here is 
> the stack: 
> Checksum error:  ../task_0011_r_000140_0/all.2.1 at 5342920704
>       at 
> org.apache.hadoop.fs.FSDataInputStream$Checker.verifySum(FSDataInputStream.java:134)
>       at 
> org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:110)
>       at 
> org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:170)
>       at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>       at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>       at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
>       at java.io.DataInputStream.readFully(DataInputStream.java:176)
>       at 
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:55)
>       at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:89)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1061)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:1126)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.nextRaw(SequenceFile.java:1354)
>       at 
> org.apache.hadoop.io.SequenceFile$Sorter$MergeStream.next(SequenceFile.java:1880)
>       at 
> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:1938)
>       at 
> org.apache.hadoop.io.SequenceFile$Sorter$MergePass.run(SequenceFile.java:1802)
>       at 
> org.apache.hadoop.io.SequenceFile$Sorter.mergePass(SequenceFile.java:1749)
>       at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:1494)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
>       at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1066)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to