Yes, our sequence files are stored in hdfs.
Some of them are constructed via the FileUtil.copyMerge routine and some
are the results of a mapper or a reducer and they are all in hdfs.
Eric Baldeschwieler wrote:
I created HADOOP-2497 to describe this bug.
Was your sequence file stored on HDFS? Because HDFS does provide
checksums.
On Dec 28, 2007, at 7:20 AM, Jason Venner wrote:
Our OOM was being caused by a damaged sequence data file. We had
assumed that the sequence files had checksums, which appears to be in
correct.
The deserializer was reading a bad length out of the file and trying
to allocate 4gig of ram.