Hi Saptarshi, Are you able to reproduce this on the 0.20.1rc1 uploaded last week?
http://people.apache.org/~omalley/hadoop-0.20.1-rc1/ If so, it would be worth putting together a test case. If you can reproduce this in a JUnit test (even if it only happens once every few runs) you should definitely open a JIRA. Thanks, -Todd On Sat, Sep 5, 2009 at 12:22 PM, Saptarshi Guha <[email protected]>wrote: > Hello, > I'm using the the textoutputformat in mapreduce/lib/output with Hadoop 0.20 > and it appears it is not writing all the keys to the output file even though > the > the write method in the RecordWriter is recieving them. Let me explain > > 1) I copied TextOutputFormat save for some debugging print messages > > public synchronized void write(K key, V value) > throws IOException { > > boolean nullKey = key == null || key instanceof NullWritable; > boolean nullValue = value == null || value instanceof NullWritable; > if (nullKey && nullValue) { > return; > } > if (!nullKey) { > writeObject(key); > } > if (!(nullKey || nullValue)) { > out.write(keyValueSeparator); > } > if (!nullValue) { > writeObject(value); > } > out.write(newline); > > System.out.println("Key="+key.toString()); > System.out.println("Value="+value.toString()); > } > > I expect 52 keys corresponding to the upper/lower case keys of the > alphabet. I get < 52 keys in the output folder, sometimes 44, some times, > and once even 52. > /However/, the write method above does recieve the missing K,V value as > evidenced by the log file messages, i.e i see Key=(missing key) and > Value=(missing-value) > Hence for some reason, a) it is not writing,b) writing but not > flushing/commiting or c) the temporary outputs are getting deleted. > Also if a given reducer has received e.g 5 keys, i see messages for 5 > keys, of which a few (but not all) are missing. > > SequenceFileOutputFormat does not have the same issues(all 52 present) > > Any ideas?My bug? > Kind Regards > Saptarshi > > Version: 0.20.0, r763504 > Compiled: Thu Apr 9 05:18:40 UTC 2009 by ndaley > Identifier: 200908281653 > > > > Saptarshi Guha | [email protected] | > http://www.stat.purdue.edu/~sguha <http://www.stat.purdue.edu/%7Esguha> > Kindness is a language which the deaf can hear and the blind can read. > -- Mark Twain > >
