I did indeed think that addInputPath() set the InputFormat class, so
this is probably what has been my problem. I'll try this when I gain
access to my cluster again on Monday, but I'm fairly confident that this
will fix my program.
Thank you very much for a good answer.
Take care, I will post an update on Monday.
HRoger wrote:
I has read your code ,I think you should add
job.setInputFormatClass(MultiLineInputFormat.class);
when you not set the that ,it would use TextInputFormat and the value is
Text default.You may thought
that "MultiLineInputFormat.addInputPath()" would set the InputFormatClass
auto, but it doesn't do that.
You also can set configuration.set("mapred.job.tracker","local") and add
some log info to debug you program.
Good Luck!
Per Stolpe wrote:
Hi.
I'm quite new to Hadoop programming, so to get a good start I started
writing my own program that summarizes a column in a large tab separated
file (~100 000 000 lines). My first naive implementation was quite simple,
a
small rework of the WordCounter example that comes with Hadoop. This
program
did calculate the correct answer, but it performed quite badly, since
every
line in the file invokes a call to map(). To solve this, I wrote my own
RecordReader, one that would return a List<Text> instead of just a Text.
It
does type check in Eclipse and all seems to be fine until I actually run
the
program. When I do, I get the following error:
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
java.util.List
at Summarizer$TokenizerMapper.map(Summarizer.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:518)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
(repeated several times)
What might be the problem?
And are there maybe InputFormat (that are not marked as Deprecated) that
already solves my problem?
Source code:
Summarizer: http://pastebin.com/m52876939
RecordReader: http://pastebin.com/m2c541a00
InputFormat: http://pastebin.com/m7714b0c
Hadoop version: 0.20.0
Java JDK version: 1.6 u14
Regards,
Per and Felix