Passing key-value pairs between chained jobs?

Michael Parker Thu, 14 Jun 2012 10:42:48 -0700

Hi all,

One more question. I have two jobs to run serially using a JobControl.
The key-value types for the outputs of the reducer of the first job
are <ActiveDayKey, Text>, where ActiveDayKey is a class that
implements WritableComparable. And so the key-value types for the
inputs to the mapper of the second job are <ActiveDayKey, Text>. I'm
noticing two things:


First, in the output of the reducer from the first job, each
ActiveDayKey object is being written as a string using its toString
method. Since it's a subclass of WritableComparable that already knows
how to serialize itself using write(DataOuptut), is there any way to
exploit that to write it in binary format? Otherwise, do I need to
write a subclass of FileOutputFormat?

Second, the second job fails with "java.lang.ClassCastException:
org.apache.hadoop.io.LongWritable cannot be cast to
co.adhoclabs.LogProcessor$ActiveDayKey." I'm assuming this is because
by default the key type is Long for the line number, and here I want
to ignore the line number and use the ActiveDayKey written on the line
itself as the key. Again, since ActiveDayKey knows how to deserialize
itself using readFields(DataInput), is there any way to exploit that
to read it from the line in binary format? Do I need to write a
subclass of FileInputFormat?

Assuming I need to write subclasses of FileOutputFormat and
FileInputFormat classes, what's a good example of this? The terasort
example?

Thanks,
Mike

Passing key-value pairs between chained jobs?

Reply via email to