[
https://issues.apache.org/jira/browse/HADOOP-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Nauroth updated HADOOP-145:
---------------------------------
Assignee: Owen O'Malley (was: Chris Nauroth)
> io.skip.checksum.errors property clashes with
> LocalFileSystem#reportChecksumFailure
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-145
> URL: https://issues.apache.org/jira/browse/HADOOP-145
> Project: Hadoop Common
> Issue Type: Bug
> Components: io
> Reporter: stack
> Assignee: Owen O'Malley
>
> Below is from email to the dev list on Tue, 11 Apr 2006 14:46:09 -0700.
> Checksum errors seem to be a fact of life given the hardware we use. They'll
> often cause my jobs to fail so I have been trying to figure how to just skip
> the bad records and files. At the end is a note where Stefan pointed me at
> 'io.skip.checksum.errors'. This property, when set, triggers special
> handling of checksum errors inside SequenceFile$Reader: If a checksum, try to
> skip to next record. Only, this behavior can conflict with another checksum
> handler that moves aside the problematic file whenever a checksum error is
> found. Below is from a recent log.
> 060411 202203 task_r_22esh3 Moving bad file
> /2/hadoop/tmp/task_r_22esh3/task_m_e3chga.out to
> /2/bad_files/task_m_e3chga.out.1707416716
> 060411 202203 task_r_22esh3 Bad checksum at 3578152. Skipping entries.
> 060411 202203 task_r_22esh3 Error running child
> 060411 202203 task_r_22esh3 java.nio.channels.ClosedChannelException
> 060411 202203 task_r_22esh3 at
> sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:89)
> 060411 202203 task_r_22esh3 at
> sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:276)
> 060411 202203 task_r_22esh3 at
> org.apache.hadoop.fs.LocalFileSystem$LocalFSFileInputStream.seek(LocalFileSystem.java:79)
> 060411 202203 task_r_22esh3 at
> org.apache.hadoop.fs.FSDataInputStream$Checker.seek(FSDataInputStream.java:67)
> 060411 202203 task_r_22esh3 at
> org.apache.hadoop.fs.FSDataInputStream$PositionCache.seek(FSDataInputStream.java:164)
> 060411 202203 task_r_22esh3 at
> org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java:193)
> 060411 202203 task_r_22esh3 at
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:243)
> 060411 202203 task_r_22esh3 at
> org.apache.hadoop.io.SequenceFile$Reader.seek(SequenceFile.java:420)
> 060411 202203 task_r_22esh3 at
> org.apache.hadoop.io.SequenceFile$Reader.sync(SequenceFile.java:431)
> 060411 202203 task_r_22esh3 at
> org.apache.hadoop.io.SequenceFile$Reader.handleChecksumException(SequenceFile.java:412)
> 060411 202203 task_r_22esh3 at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:389)
> 060411 202203 task_r_22esh3 at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:209)
> 060411 202203 task_r_22esh3 at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:709)
> (Ignore line numbers. My code is a little different from main because I've
> other debugging code inside in SequenceFile. Otherwise I'm running w/ head
> of hadoop).
> The SequenceFile$Reader#handleChecksumException is trying to skip to next
> record but the file has been closed by the move-aside.
> On the list there is some discussion on merit of moving aside file when bad
> checksum found. I've trying to test what happens if we leave the file in
> place but haven't had a checksum error in a while.
> Opening this issue so place to fill in experience as we go.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira