Hello, The problem is rather odd. I installed the version you mentioned and still have the same problem. My HBBytesWritable has a toString method. Texouputformat calls this.
a) If my toString method outputs a the bytes (like the toString method in BytesWritable), i do not have any skipped keys b) if instead my toString calls an external function (given byte[] return a string), though the TextOutputFormat receives the bytes(as I mentioned before), it doesn't get written to disk. Not sure whether this my design fault or not Regards Saptarshi On Mon, Sep 7, 2009 at 12:16 PM, Todd Lipcon<[email protected]> wrote: > Hi Saptarshi, > > Are you able to reproduce this on the 0.20.1rc1 uploaded last week? > > http://people.apache.org/~omalley/hadoop-0.20.1-rc1/ > > If so, it would be worth putting together a test case. If you can reproduce > this in a JUnit test (even if it only happens once every few runs) you > should definitely open a JIRA. > > Thanks, > -Todd > > On Sat, Sep 5, 2009 at 12:22 PM, Saptarshi Guha > <[email protected]>wrote: > >> Hello, >> I'm using the the textoutputformat in mapreduce/lib/output with Hadoop 0.20 >> and it appears it is not writing all the keys to the output file even though >> the >> the write method in the RecordWriter is recieving them. Let me explain >> >> 1) I copied TextOutputFormat save for some debugging print messages >> >> public synchronized void write(K key, V value) >> throws IOException { >> >> boolean nullKey = key == null || key instanceof NullWritable; >> boolean nullValue = value == null || value instanceof NullWritable; >> if (nullKey && nullValue) { >> return; >> } >> if (!nullKey) { >> writeObject(key); >> } >> if (!(nullKey || nullValue)) { >> out.write(keyValueSeparator); >> } >> if (!nullValue) { >> writeObject(value); >> } >> out.write(newline); >> >> System.out.println("Key="+key.toString()); >> System.out.println("Value="+value.toString()); >> } >> >> I expect 52 keys corresponding to the upper/lower case keys of the >> alphabet. I get < 52 keys in the output folder, sometimes 44, some times, >> and once even 52. >> /However/, the write method above does recieve the missing K,V value as >> evidenced by the log file messages, i.e i see Key=(missing key) and >> Value=(missing-value) >> Hence for some reason, a) it is not writing,b) writing but not >> flushing/commiting or c) the temporary outputs are getting deleted. >> Also if a given reducer has received e.g 5 keys, i see messages for 5 >> keys, of which a few (but not all) are missing. >> >> SequenceFileOutputFormat does not have the same issues(all 52 present) >> >> Any ideas?My bug? >> Kind Regards >> Saptarshi >> >> Version: 0.20.0, r763504 >> Compiled: Thu Apr 9 05:18:40 UTC 2009 by ndaley >> Identifier: 200908281653 >> >> >> >> Saptarshi Guha | [email protected] | >> http://www.stat.purdue.edu/~sguha <http://www.stat.purdue.edu/%7Esguha> >> Kindness is a language which the deaf can hear and the blind can read. >> -- Mark Twain >> >> >
