Hi all, One more question. I have two jobs to run serially using a JobControl. The key-value types for the outputs of the reducer of the first job are <ActiveDayKey, Text>, where ActiveDayKey is a class that implements WritableComparable. And so the key-value types for the inputs to the mapper of the second job are <ActiveDayKey, Text>. I'm noticing two things:
First, in the output of the reducer from the first job, each ActiveDayKey object is being written as a string using its toString method. Since it's a subclass of WritableComparable that already knows how to serialize itself using write(DataOuptut), is there any way to exploit that to write it in binary format? Otherwise, do I need to write a subclass of FileOutputFormat? Second, the second job fails with "java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to co.adhoclabs.LogProcessor$ActiveDayKey." I'm assuming this is because by default the key type is Long for the line number, and here I want to ignore the line number and use the ActiveDayKey written on the line itself as the key. Again, since ActiveDayKey knows how to deserialize itself using readFields(DataInput), is there any way to exploit that to read it from the line in binary format? Do I need to write a subclass of FileInputFormat? Assuming I need to write subclasses of FileOutputFormat and FileInputFormat classes, what's a good example of this? The terasort example? Thanks, Mike