io.skip.checksum.errors property clashes with 
LocalFileSystem#reportChecksumFailure
-----------------------------------------------------------------------------------

         Key: HADOOP-145
         URL: http://issues.apache.org/jira/browse/HADOOP-145
     Project: Hadoop
        Type: Bug

  Components: io  
    Reporter: [EMAIL PROTECTED]


Below is from email to the dev list on Tue, 11 Apr 2006 14:46:09 -0700.

Checksum errors seem to be a fact of life given the hardware we use.  They'll 
often cause my jobs to fail so I have been trying to figure how to just skip 
the bad records and files.  At the end is a note where Stefan pointed me at 
'io.skip.checksum.errors'.   This property, when set, triggers special handling 
of checksum errors inside SequenceFile$Reader: If a checksum, try to skip to 
next record.  Only, this behavior can conflict with another checksum handler 
that moves aside the problematic file whenever a checksum error is found.  
Below is from a recent log.

060411 202203 task_r_22esh3  Moving bad file 
/2/hadoop/tmp/task_r_22esh3/task_m_e3chga.out to 
/2/bad_files/task_m_e3chga.out.1707416716
060411 202203 task_r_22esh3  Bad checksum at 3578152. Skipping entries.
060411 202203 task_r_22esh3  Error running child
060411 202203 task_r_22esh3 java.nio.channels.ClosedChannelException
060411 202203 task_r_22esh3     at 
sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:89)
060411 202203 task_r_22esh3     at 
sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:276)
060411 202203 task_r_22esh3     at 
org.apache.hadoop.fs.LocalFileSystem$LocalFSFileInputStream.seek(LocalFileSystem.java:79)
060411 202203 task_r_22esh3     at 
org.apache.hadoop.fs.FSDataInputStream$Checker.seek(FSDataInputStream.java:67)
060411 202203 task_r_22esh3     at 
org.apache.hadoop.fs.FSDataInputStream$PositionCache.seek(FSDataInputStream.java:164)
060411 202203 task_r_22esh3     at 
org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java:193)
060411 202203 task_r_22esh3     at 
org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:243)
060411 202203 task_r_22esh3     at 
org.apache.hadoop.io.SequenceFile$Reader.seek(SequenceFile.java:420)
060411 202203 task_r_22esh3     at 
org.apache.hadoop.io.SequenceFile$Reader.sync(SequenceFile.java:431)
060411 202203 task_r_22esh3     at 
org.apache.hadoop.io.SequenceFile$Reader.handleChecksumException(SequenceFile.java:412)
060411 202203 task_r_22esh3     at 
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:389)
060411 202203 task_r_22esh3     at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:209)
060411 202203 task_r_22esh3     at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:709)

(Ignore line numbers.  My code is a little different from main because I've 
other debugging code inside in SequenceFile.  Otherwise I'm running w/ head of 
hadoop).

The SequenceFile$Reader#handleChecksumException is trying to skip to next 
record but the file has been closed by the move-aside.


On the list there is some discussion on merit of moving aside file when bad 
checksum found.  I've trying to test what happens if we leave the file in place 
but haven't had a checksum error in a while.  

Opening this issue so place to fill in experience as we go.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to