Re: Does the class of the Mapper output need to match the exact class of the specified output?

Wilkes, Chris Wed, 27 Jan 2010 11:01:45 -0800

What I've done is created a simple wrapper class "TaggedWritable" thathas a String (the "tag") and a Writable as fields. That does the trick.


public class TaggedWritable implements Writable {
        private String m_key;
        private Writable m_value;
        public TaggedWritable() {       }
        public TaggedWritable(String key, Writable value) {
                m_key = key;
                m_value = value;
        }
        public String getTag() {
                return m_key;
        }
        public Writable getValue() {
                return m_value;
        }
        @SuppressWarnings("unchecked")
        @Override
        public void readFields(DataInput in) throws IOException {
                m_key = readString(in);
                String className = readString(in);
                try {

Class<Writable> valueClass = (Class<Writable>)Class.forName(className);

                        m_value = valueClass.newInstance();
                        m_value.readFields(in);
                } catch (Exception ex) {

throw new IllegalStateException("Error converting " + className + "to writable", ex);

                }
        }
        @Override
        public void write(DataOutput out) throws IOException {
                writeString(out, m_key);
                writeString(out, m_value.getClass().getName());
                m_value.write(out);
        }
}


On Jan 27, 2010, at 10:49 AM, Alex Kozlov wrote:

Currently map output supports only one class, but it does notprevent you from encapsulating another field or class in your ownWritable and serializing it.
AVRO is supposed to have multiple formats out-of-the-box, but itdoes not have Input/Output formats yet (0.21.0?)
Hadoop is using it's own serialization outside the Javaserialization for performance reasons.
Alex
On Tue, Jan 26, 2010 at 5:17 PM, Wilkes, Chris <cwil...@gmail.com>wrote:I'm outputting a Text and a LongWritable in my mapper and told thejob that my mapper output class is Writable (the interface shared byboth of them):
  job.setMapOutputValueClass(Writable.class);
I'm doing this as I have two different types of input files and amcombining them together. I could write them both as as Text butthen I'll have to put a marker in front of the tag to indicate whattype of entry it is instead of doing aif (value instanceof Text) { } else if (value instanceofLongWritable) { }
This exception is thrown:
java.io.IOException: Type mismatch in value from map: expectedorg.apache.hadoop.io.Writable, recievedorg.apache.hadoop.io.LongWritableat org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:812)at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)atorg.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)The MapTask code (which is being used even though I'm using the newAPI) shows that a != is used to compare the classes:
http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/MapTask.java?view=log
 if (value.getClass() != valClass) {
throw new IOException("Type mismatch in value from map:expected "
                              + valClass.getName() + ", recieved "


                              + value.getClass().getName());
      }
Does this level of checking really need to be done? Could it justbe a Class.isAssignableFrom() check?

Re: Does the class of the Mapper output need to match the exact class of the specified output?

Reply via email to