What I've done is created a simple wrapper class "TaggedWritable" that
has a String (the "tag") and a Writable as fields. That does the trick.
public class TaggedWritable implements Writable {
private String m_key;
private Writable m_value;
public TaggedWritable() { }
public TaggedWritable(String key, Writable value) {
m_key = key;
m_value = value;
}
public String getTag() {
return m_key;
}
public Writable getValue() {
return m_value;
}
@SuppressWarnings("unchecked")
@Override
public void readFields(DataInput in) throws IOException {
m_key = readString(in);
String className = readString(in);
try {
Class<Writable> valueClass = (Class<Writable>)
Class.forName(className);
m_value = valueClass.newInstance();
m_value.readFields(in);
} catch (Exception ex) {
throw new IllegalStateException("Error converting " + className + "
to writable", ex);
}
}
@Override
public void write(DataOutput out) throws IOException {
writeString(out, m_key);
writeString(out, m_value.getClass().getName());
m_value.write(out);
}
}
On Jan 27, 2010, at 10:49 AM, Alex Kozlov wrote:
Currently map output supports only one class, but it does not
prevent you from encapsulating another field or class in your own
Writable and serializing it.
AVRO is supposed to have multiple formats out-of-the-box, but it
does not have Input/Output formats yet (0.21.0?)
Hadoop is using it's own serialization outside the Java
serialization for performance reasons.
Alex
On Tue, Jan 26, 2010 at 5:17 PM, Wilkes, Chris <cwil...@gmail.com>
wrote:
I'm outputting a Text and a LongWritable in my mapper and told the
job that my mapper output class is Writable (the interface shared by
both of them):
job.setMapOutputValueClass(Writable.class);
I'm doing this as I have two different types of input files and am
combining them together. I could write them both as as Text but
then I'll have to put a marker in front of the tag to indicate what
type of entry it is instead of doing a
if (value instanceof Text) { } else if (value instanceof
LongWritable) { }
This exception is thrown:
java.io.IOException: Type mismatch in value from map: expected
org.apache.hadoop.io.Writable, recieved
org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask
$MapOutputBuffer.collect(MapTask.java:812)
at org.apache.hadoop.mapred.MapTask
$NewOutputCollector.write(MapTask.java:504)
at
org
.apache
.hadoop
.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:
80)
The MapTask code (which is being used even though I'm using the new
API) shows that a != is used to compare the classes:
http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/MapTask.java?view=log
if (value.getClass() != valClass) {
throw new IOException("Type mismatch in value from map:
expected "
+ valClass.getName() + ", recieved "
+ value.getClass().getName());
}
Does this level of checking really need to be done? Could it just
be a Class.isAssignableFrom() check?