Hello,
I'm using the the textoutputformat in mapreduce/lib/output with Hadoop 0.20 and it appears it is not writing all the keys to the output file even though the
the write method in the RecordWriter is recieving them. Let me explain

1) I copied TextOutputFormat  save for some debugging print messages

    public synchronized void write(K key, V value)
      throws IOException {

      boolean nullKey = key == null || key instanceof NullWritable;
boolean nullValue = value == null || value instanceof NullWritable;
      if (nullKey && nullValue) {
        return;
      }
      if (!nullKey) {
        writeObject(key);
      }
      if (!(nullKey || nullValue)) {
        out.write(keyValueSeparator);
      }
      if (!nullValue) {
        writeObject(value);
      }
      out.write(newline);

            System.out.println("Key="+key.toString());
            System.out.println("Value="+value.toString());
    }

I expect 52 keys corresponding to the upper/lower case keys of the alphabet. I get < 52 keys in the output folder, sometimes 44, some times, and once even 52. /However/, the write method above does recieve the missing K,V value as evidenced by the log file messages, i.e i see Key=(missing key) and Value=(missing-value) Hence for some reason, a) it is not writing,b) writing but not flushing/commiting or c) the temporary outputs are getting deleted. Also if a given reducer has received e.g 5 keys, i see messages for 5 keys, of which a few (but not all) are missing.

SequenceFileOutputFormat does not have the same issues(all 52 present)

Any ideas?My bug?
Kind Regards
Saptarshi

Version: 0.20.0, r763504
Compiled: Thu Apr 9 05:18:40 UTC 2009 by ndaley
Identifier: 200908281653



Saptarshi Guha | [email protected] | http://www.stat.purdue.edu/~sguha
Kindness is a language which the deaf can hear and the blind can read.
                -- Mark Twain

Reply via email to