[jira] [Updated] (HADOOP-145) io.skip.checksum.errors property clashes with LocalFileSystem#reportChecksumFailure

Chris Nauroth (JIRA) Tue, 06 Aug 2013 13:22:35 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chris Nauroth updated HADOOP-145:
---------------------------------

    Assignee: Owen O'Malley  (was: Chris Nauroth)
    
> io.skip.checksum.errors property clashes with 
> LocalFileSystem#reportChecksumFailure
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-145
>                 URL: https://issues.apache.org/jira/browse/HADOOP-145
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>            Reporter: stack
>            Assignee: Owen O'Malley
>
> Below is from email to the dev list on Tue, 11 Apr 2006 14:46:09 -0700.
> Checksum errors seem to be a fact of life given the hardware we use.  They'll 
> often cause my jobs to fail so I have been trying to figure how to just skip 
> the bad records and files.  At the end is a note where Stefan pointed me at 
> 'io.skip.checksum.errors'.   This property, when set, triggers special 
> handling of checksum errors inside SequenceFile$Reader: If a checksum, try to 
> skip to next record.  Only, this behavior can conflict with another checksum 
> handler that moves aside the problematic file whenever a checksum error is 
> found.  Below is from a recent log.
> 060411 202203 task_r_22esh3  Moving bad file 
> /2/hadoop/tmp/task_r_22esh3/task_m_e3chga.out to 
> /2/bad_files/task_m_e3chga.out.1707416716
> 060411 202203 task_r_22esh3  Bad checksum at 3578152. Skipping entries.
> 060411 202203 task_r_22esh3  Error running child
> 060411 202203 task_r_22esh3 java.nio.channels.ClosedChannelException
> 060411 202203 task_r_22esh3     at 
> sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:89)
> 060411 202203 task_r_22esh3     at 
> sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:276)
> 060411 202203 task_r_22esh3     at 
> org.apache.hadoop.fs.LocalFileSystem$LocalFSFileInputStream.seek(LocalFileSystem.java:79)
> 060411 202203 task_r_22esh3     at 
> org.apache.hadoop.fs.FSDataInputStream$Checker.seek(FSDataInputStream.java:67)
> 060411 202203 task_r_22esh3     at 
> org.apache.hadoop.fs.FSDataInputStream$PositionCache.seek(FSDataInputStream.java:164)
> 060411 202203 task_r_22esh3     at 
> org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java:193)
> 060411 202203 task_r_22esh3     at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:243)
> 060411 202203 task_r_22esh3     at 
> org.apache.hadoop.io.SequenceFile$Reader.seek(SequenceFile.java:420)
> 060411 202203 task_r_22esh3     at 
> org.apache.hadoop.io.SequenceFile$Reader.sync(SequenceFile.java:431)
> 060411 202203 task_r_22esh3     at 
> org.apache.hadoop.io.SequenceFile$Reader.handleChecksumException(SequenceFile.java:412)
> 060411 202203 task_r_22esh3     at 
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:389)
> 060411 202203 task_r_22esh3     at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:209)
> 060411 202203 task_r_22esh3     at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:709)
> (Ignore line numbers.  My code is a little different from main because I've 
> other debugging code inside in SequenceFile.  Otherwise I'm running w/ head 
> of hadoop).
> The SequenceFile$Reader#handleChecksumException is trying to skip to next 
> record but the file has been closed by the move-aside.
> On the list there is some discussion on merit of moving aside file when bad 
> checksum found.  I've trying to test what happens if we leave the file in 
> place but haven't had a checksum error in a while.  
> Opening this issue so place to fill in experience as we go.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-145) io.skip.checksum.errors property clashes with LocalFileSystem#reportChecksumFailure

Reply via email to