Re: Does the class of the Mapper output need to match the exact class of the specified output?

Jeff Hammerbacher Wed, 27 Jan 2010 13:03:15 -0800

Hey Chris,

You may want to see https://issues.apache.org/jira/browse/MAPREDUCE-1126 and
https://issues.apache.org/jira/browse/MAPREDUCE-815 to see how Avro is being
integrated into MapReduce. In particular, I think you would be well served
by Avro's union type, though I'm not sure I understand your use case
completely.


Thanks,
Jeff

On Wed, Jan 27, 2010 at 11:01 AM, Wilkes, Chris <cwil...@gmail.com> wrote:

> What I've done is created a simple wrapper class "TaggedWritable" that has
> a String (the "tag") and a Writable as fields.  That does the trick.
>
> public class TaggedWritable implements Writable {
> private String m_key;
> private Writable m_value;
> public TaggedWritable() { }
> public TaggedWritable(String key, Writable value) {
> m_key = key;
> m_value = value;
> }
> public String getTag() {
> return m_key;
> }
> public Writable getValue() {
> return m_value;
> }
> @SuppressWarnings("unchecked")
> @Override
> public void readFields(DataInput in) throws IOException {
> m_key = readString(in);
> String className = readString(in);
> try {
> Class<Writable> valueClass = (Class<Writable>) Class.forName(className);
> m_value = valueClass.newInstance();
> m_value.readFields(in);
> } catch (Exception ex) {
> throw new IllegalStateException("Error converting " + className + " to
> writable", ex);
> }
> }
> @Override
> public void write(DataOutput out) throws IOException {
> writeString(out, m_key);
> writeString(out, m_value.getClass().getName());
> m_value.write(out);
> }
> }
>
> On Jan 27, 2010, at 10:49 AM, Alex Kozlov wrote:
>
> Currently map output supports only one class, but it does not prevent you
> from encapsulating another field or class in your own Writable and
> serializing it.
>
> AVRO <http://hadoop.apache.org/avro/docs/current/spec.html> is supposed to
> have multiple formats out-of-the-box, but it does not have Input/Output
> formats yet (0.21.0?)
>
> Hadoop is using it's own serialization outside the Java serialization for
> performance reasons.
>
> Alex
>
> On Tue, Jan 26, 2010 at 5:17 PM, Wilkes, Chris <cwil...@gmail.com> wrote:
>
>> I'm outputting a Text and a LongWritable in my mapper and told the job
>> that my mapper output class is Writable (the interface shared by both of
>> them):
>>   job.setMapOutputValueClass(Writable.class);
>> I'm doing this as I have two different types of input files and am
>> combining them together.  I could write them both as as Text but then I'll
>> have to put a marker in front of the tag to indicate what type of entry it
>> is instead of doing a
>>   if (value instanceof Text) { }  else if (value instanceof LongWritable)
>> { }
>>
>> This exception is thrown:
>>
>> java.io.IOException: Type mismatch in value from map: expected
>> org.apache.hadoop.io.Writable, recieved org.apache.hadoop.io.LongWritable
>>
>>      at 
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:812)
>>      at 
>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
>>      at 
>> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>>
>> The MapTask code (which is being used even though I'm using the new API) 
>> shows that a != is used to compare the classes:
>>
>> http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/MapTask.java?view=log
>>
>>  if (value.getClass() != valClass) {
>>
>>         throw new IOException("Type mismatch in value from map: expected "
>>
>>                               + valClass.getName() + ", recieved "
>>
>>                               + value.getClass().getName());
>>       }
>>
>>
>> Does this level of checking really need to be done?  Could it just be a 
>> Class.isAssignableFrom() check?
>>
>>
>>
>
>

Re: Does the class of the Mapper output need to match the exact class of the specified output?

Reply via email to