On 8/26/10 7:47 PM, newpant wrote:
Hi, do you use JobConf.setInputFormat(KeyValueTextInputFormat.class) to set
the input format class ? Default input format class is TextInputFormat, and
the Key type is LongWritable, which store offset of lines in the file (in
byte)
if your reducer accept a different key or value from mapper output, you need
to setMapOutputKeyClass and setMapOutputValueClass
2010/8/27 Mark<[email protected]>
When I configure my job to use a KeyValueTextInputFormat doesn't that
imply that the key and value to my mapper will be both Text?
I have it set up like this and I am using the default Mapper.class ie
IdentityMapper
- KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]));
but I keep receiving this error:
- java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
cast to org.apache.hadoop.io.Text
I would expect this error if I was using the FileInputFormat because that
return the key as a LongWritable and the value as Text but I am unsure of
why its happening here.
Also on the same note, when I supply FileInputFormat or
KeyValueTextInputFormat does that implicitly set job.setMapOutputKeyClass
and job.setMapOutputValueClass. When are these used?
Thanks for the clarification
No I didnt set that and when I did everything worked as expected. I
thought if I used:
KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]))
it would set that for me or at lest know that it would be text/text as
input. Im guessing that is wrong.
if your reducer accept a different key or value from mapper output, you need
to setMapOutputKeyClass and setMapOutputValueClass
When would this ever come up? Does it just cast to the appropriate
classes then?
Thanks