What I've done is created a simple wrapper class "TaggedWritable" that has a String (the "tag") and a Writable as fields. That does the trick.

public class TaggedWritable implements Writable {
        private String m_key;
        private Writable m_value;
        public TaggedWritable() {       }
        public TaggedWritable(String key, Writable value) {
                m_key = key;
                m_value = value;
        }
        public String getTag() {
                return m_key;
        }
        public Writable getValue() {
                return m_value;
        }
        @SuppressWarnings("unchecked")
        @Override
        public void readFields(DataInput in) throws IOException {
                m_key = readString(in);
                String className = readString(in);
                try {
Class<Writable> valueClass = (Class<Writable>) Class.forName(className);
                        m_value = valueClass.newInstance();
                        m_value.readFields(in);
                } catch (Exception ex) {
throw new IllegalStateException("Error converting " + className + " to writable", ex);
                }
        }
        @Override
        public void write(DataOutput out) throws IOException {
                writeString(out, m_key);
                writeString(out, m_value.getClass().getName());
                m_value.write(out);
        }
}

On Jan 27, 2010, at 10:49 AM, Alex Kozlov wrote:

Currently map output supports only one class, but it does not prevent you from encapsulating another field or class in your own Writable and serializing it.

AVRO is supposed to have multiple formats out-of-the-box, but it does not have Input/Output formats yet (0.21.0?)

Hadoop is using it's own serialization outside the Java serialization for performance reasons.

Alex

On Tue, Jan 26, 2010 at 5:17 PM, Wilkes, Chris <cwil...@gmail.com> wrote: I'm outputting a Text and a LongWritable in my mapper and told the job that my mapper output class is Writable (the interface shared by both of them):
  job.setMapOutputValueClass(Writable.class);
I'm doing this as I have two different types of input files and am combining them together. I could write them both as as Text but then I'll have to put a marker in front of the tag to indicate what type of entry it is instead of doing a if (value instanceof Text) { } else if (value instanceof LongWritable) { }

This exception is thrown:

java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Writable, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask $MapOutputBuffer.collect(MapTask.java:812) at org.apache.hadoop.mapred.MapTask $NewOutputCollector.write(MapTask.java:504) at org .apache .hadoop .mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java: 80) The MapTask code (which is being used even though I'm using the new API) shows that a != is used to compare the classes:
http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/MapTask.java?view=log
 if (value.getClass() != valClass) {


throw new IOException("Type mismatch in value from map: expected "


                              + valClass.getName() + ", recieved "


                              + value.getClass().getName());
      }



Does this level of checking really need to be done? Could it just be a Class.isAssignableFrom() check?





Reply via email to