Doug Cutting wrote:
Michael Stack wrote:
One question: The 'io.skip.checksum.errors' is only read in SequenceFile#next but the LocalFileSystem checksum error "move-aside" handler can be triggered by other than just a call out of SequenceFile#next. If so, stopping the LocalFileSystem move-aside on checksum error is probably not the right thing to do.

Right, we ideally want SequenceFile to disable it when that flag is set. But that would take a lot of plumbing to implement!
Yes.
Perhaps we should instead fix this by not closing the file in LocalFilesystem.reportChecksumFailure. Then it won't be able to move the file aside on Windows. To fix that, we can (1) try to move it without closing it (since something on the stack will eventually close it anyway, and may still need it open) and (2) if the move fails, try closing it and moving it (for Windows). The net effect is that io.skip.checksum.errors will then work on Unix but not on Windows. Or we could skip moving it altogether, since it seems that most checksum errors we're seeing are not disk errors but memory errors before the data hits the disk.
What if we did not move the file? A checksum error would be thrown. If we're inside SequenceFile#next and 'io.skip.checksum.errors' is set, then we'll just try to move to next record. I do not have the experience with the code base to know if not-moving will manufacture weird scenarios elsewhere in the code base.

A checksum failure on a local file currently causes the task to fail. But it takes multiple checksum errors per job to get a job to fail, right? Is that what's happening?
It is. Jobs are long-running -- a day or more (I should probably try cutting them into smaller pieces). What I usually see is a failure for some genuinely odd reason. Then the task lands on a machine that has started to exhibit checksum errors. After each failure, the task is rescheduled and it always seems to land back at the problematic machine (Anything I can do about randomizing the machine a task gets assigned too?).

St.Ack

Reply via email to