Hello,
I'm using the the textoutputformat in mapreduce/lib/output with Hadoop
0.20 and it appears it is not writing all the keys to the output file
even though the
the write method in the RecordWriter is recieving them. Let me explain
1) I copied TextOutputFormat save for some debugging print messages
public synchronized void write(K key, V value)
throws IOException {
boolean nullKey = key == null || key instanceof NullWritable;
boolean nullValue = value == null || value instanceof
NullWritable;
if (nullKey && nullValue) {
return;
}
if (!nullKey) {
writeObject(key);
}
if (!(nullKey || nullValue)) {
out.write(keyValueSeparator);
}
if (!nullValue) {
writeObject(value);
}
out.write(newline);
System.out.println("Key="+key.toString());
System.out.println("Value="+value.toString());
}
I expect 52 keys corresponding to the upper/lower case keys of the
alphabet. I get < 52 keys in the output folder, sometimes 44, some
times, and once even 52.
/However/, the write method above does recieve the missing K,V value
as evidenced by the log file messages, i.e i see Key=(missing key) and
Value=(missing-value)
Hence for some reason, a) it is not writing,b) writing but not
flushing/commiting or c) the temporary outputs are getting deleted.
Also if a given reducer has received e.g 5 keys, i see messages for 5
keys, of which a few (but not all) are missing.
SequenceFileOutputFormat does not have the same issues(all 52 present)
Any ideas?My bug?
Kind Regards
Saptarshi
Version: 0.20.0, r763504
Compiled: Thu Apr 9 05:18:40 UTC 2009 by ndaley
Identifier: 200908281653
Saptarshi Guha | [email protected] | http://www.stat.purdue.edu/~sguha
Kindness is a language which the deaf can hear and the blind can read.
-- Mark Twain