Re: KeyValueTextInputFormat

Mark Fri, 27 Aug 2010 07:42:20 -0700

 On 8/26/10 7:47 PM, newpant wrote:

Hi, do you use JobConf.setInputFormat(KeyValueTextInputFormat.class) to set
the input format class ? Default input format class is TextInputFormat, and
the Key type is LongWritable, which store offset of lines in the file (in
byte)


if your reducer accept a different key or value from mapper output, you need
to setMapOutputKeyClass and setMapOutputValueClass

2010/8/27 Mark<[email protected]>

  When I configure my job to use a KeyValueTextInputFormat doesn't that
imply that the key and value to my mapper will be both Text?

I have it set up like this and I am using the default Mapper.class ie
IdentityMapper
- KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]));

but I keep receiving this error:
- java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
cast to org.apache.hadoop.io.Text

I would expect this error if I was using the FileInputFormat because that
return the key as a LongWritable and the value as Text but I am unsure of
why its happening here.

Also on the same note, when I supply FileInputFormat or
KeyValueTextInputFormat does that implicitly set job.setMapOutputKeyClass
and job.setMapOutputValueClass. When are these used?

Thanks for the clarification

No I didnt set that and when I did everything worked as expected. Ithought if I used:


KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]))

it would set that for me or at lest know that it would be text/text asinput. Im guessing that is wrong.


if your reducer accept a different key or value from mapper output, you need
to setMapOutputKeyClass and setMapOutputValueClass

When would this ever come up? Does it just cast to the appropriateclasses then?


Thanks

Re: KeyValueTextInputFormat

Reply via email to