Hey Chris, You may want to see https://issues.apache.org/jira/browse/MAPREDUCE-1126 and https://issues.apache.org/jira/browse/MAPREDUCE-815 to see how Avro is being integrated into MapReduce. In particular, I think you would be well served by Avro's union type, though I'm not sure I understand your use case completely.
Thanks, Jeff On Wed, Jan 27, 2010 at 11:01 AM, Wilkes, Chris <cwil...@gmail.com> wrote: > What I've done is created a simple wrapper class "TaggedWritable" that has > a String (the "tag") and a Writable as fields. That does the trick. > > public class TaggedWritable implements Writable { > private String m_key; > private Writable m_value; > public TaggedWritable() { } > public TaggedWritable(String key, Writable value) { > m_key = key; > m_value = value; > } > public String getTag() { > return m_key; > } > public Writable getValue() { > return m_value; > } > @SuppressWarnings("unchecked") > @Override > public void readFields(DataInput in) throws IOException { > m_key = readString(in); > String className = readString(in); > try { > Class<Writable> valueClass = (Class<Writable>) Class.forName(className); > m_value = valueClass.newInstance(); > m_value.readFields(in); > } catch (Exception ex) { > throw new IllegalStateException("Error converting " + className + " to > writable", ex); > } > } > @Override > public void write(DataOutput out) throws IOException { > writeString(out, m_key); > writeString(out, m_value.getClass().getName()); > m_value.write(out); > } > } > > On Jan 27, 2010, at 10:49 AM, Alex Kozlov wrote: > > Currently map output supports only one class, but it does not prevent you > from encapsulating another field or class in your own Writable and > serializing it. > > AVRO <http://hadoop.apache.org/avro/docs/current/spec.html> is supposed to > have multiple formats out-of-the-box, but it does not have Input/Output > formats yet (0.21.0?) > > Hadoop is using it's own serialization outside the Java serialization for > performance reasons. > > Alex > > On Tue, Jan 26, 2010 at 5:17 PM, Wilkes, Chris <cwil...@gmail.com> wrote: > >> I'm outputting a Text and a LongWritable in my mapper and told the job >> that my mapper output class is Writable (the interface shared by both of >> them): >> job.setMapOutputValueClass(Writable.class); >> I'm doing this as I have two different types of input files and am >> combining them together. I could write them both as as Text but then I'll >> have to put a marker in front of the tag to indicate what type of entry it >> is instead of doing a >> if (value instanceof Text) { } else if (value instanceof LongWritable) >> { } >> >> This exception is thrown: >> >> java.io.IOException: Type mismatch in value from map: expected >> org.apache.hadoop.io.Writable, recieved org.apache.hadoop.io.LongWritable >> >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:812) >> at >> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504) >> at >> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) >> >> The MapTask code (which is being used even though I'm using the new API) >> shows that a != is used to compare the classes: >> >> http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/MapTask.java?view=log >> >> if (value.getClass() != valClass) { >> >> throw new IOException("Type mismatch in value from map: expected " >> >> + valClass.getName() + ", recieved " >> >> + value.getClass().getName()); >> } >> >> >> Does this level of checking really need to be done? Could it just be a >> Class.isAssignableFrom() check? >> >> >> > >