Re: Letting the Mapper handle multiple lines.

HRoger Thu, 04 Jun 2009 11:17:23 -0700

I has read your code ,I think you should add
job.setInputFormatClass(MultiLineInputFormat.class);
when you not set the that ,it would use TextInputFormat and the value is
Text default.You may thought
that "MultiLineInputFormat.addInputPath()" would set the InputFormatClass
auto, but it doesn't do that.
You also can set configuration.set("mapred.job.tracker","local") and add
some log info to debug you program.


Good Luck!

Per Stolpe wrote:
> 
> Hi.
> I'm quite new to Hadoop programming, so to get a good start I started
> writing my own program that summarizes a column in a large tab separated
> file (~100 000 000 lines). My first naive implementation was quite simple,
> a
> small rework of the WordCounter example that comes with Hadoop. This
> program
> did calculate the correct answer, but it performed quite badly, since
> every
> line in the file invokes a call to map(). To solve this, I wrote my own
> RecordReader, one that would return a List<Text> instead of just a Text.
> It
> does type check in Eclipse and all seems to be fine until I actually run
> the
> program. When I do, I get the following error:
> 
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
> java.util.List
>         at Summarizer$TokenizerMapper.map(Summarizer.java:1)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:518)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 
> (repeated several times)
> 
> What might be the problem?
> And are there maybe InputFormat (that are not marked as Deprecated) that
> already solves my problem?
> 
> Source code:
> Summarizer: http://pastebin.com/m52876939
> RecordReader: http://pastebin.com/m2c541a00
> InputFormat: http://pastebin.com/m7714b0c
> 
> Hadoop version: 0.20.0
> Java JDK version: 1.6 u14
> 
> Regards,
> Per and Felix
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Letting-the-Mapper-handle-multiple-lines.-tp23873214p23875177.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Letting the Mapper handle multiple lines.

Reply via email to