I have a task that merges a set of SequenceFiles
(BytesWritable/LongWritable) into a single SequenceFile
(BytesWritable/LongWritable), and the reduce phase fails with the
following error:
java.io.IOException: Value too large for defined data type
at java.io.FileInputStream.available(Native Method)
at
org.apache.hadoop.fs.LocalFileSystem$LocalFSFileInputStream.available(LocalFileSystem.java:96)
at java.io.FilterInputStream.available(FilterInputStream.java:169)
at java.io.FilterInputStream.available(FilterInputStream.java:169)
at java.io.BufferedInputStream.read(BufferedInputStream.java:332)
at java.io.DataInputStream.readFully(DataInputStream.java:202)
at
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:55)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:89)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:405)
at
org.apache.hadoop.io.SequenceFile$Sorter$MergeStream.next(SequenceFile.java:871)
at
org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:915)
at
org.apache.hadoop.io.SequenceFile$Sorter$MergePass.run(SequenceFile.java:800)
at
org.apache.hadoop.io.SequenceFile$Sorter.mergePass(SequenceFile.java:738)
at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:542)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:218)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1013)
This error message doesn't make much sense to me; a long should be quite
sufficient to hold the values, and the BytesWritable keys don't change.
Any ideas as to what could be wrong or how I can debug this to locate the
problem?
Thanks,
--
Vetle Roeim
Opera Software ASA <URL: http://www.opera.com/ >