I has read your code ,I think you should add job.setInputFormatClass(MultiLineInputFormat.class); when you not set the that ,it would use TextInputFormat and the value is Text default.You may thought that "MultiLineInputFormat.addInputPath()" would set the InputFormatClass auto, but it doesn't do that. You also can set configuration.set("mapred.job.tracker","local") and add some log info to debug you program.
Good Luck! Per Stolpe wrote: > > Hi. > I'm quite new to Hadoop programming, so to get a good start I started > writing my own program that summarizes a column in a large tab separated > file (~100 000 000 lines). My first naive implementation was quite simple, > a > small rework of the WordCounter example that comes with Hadoop. This > program > did calculate the correct answer, but it performed quite badly, since > every > line in the file invokes a call to map(). To solve this, I wrote my own > RecordReader, one that would return a List<Text> instead of just a Text. > It > does type check in Eclipse and all seems to be fine until I actually run > the > program. When I do, I get the following error: > > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > java.util.List > at Summarizer$TokenizerMapper.map(Summarizer.java:1) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:518) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > (repeated several times) > > What might be the problem? > And are there maybe InputFormat (that are not marked as Deprecated) that > already solves my problem? > > Source code: > Summarizer: http://pastebin.com/m52876939 > RecordReader: http://pastebin.com/m2c541a00 > InputFormat: http://pastebin.com/m7714b0c > > Hadoop version: 0.20.0 > Java JDK version: 1.6 u14 > > Regards, > Per and Felix > > -- View this message in context: http://www.nabble.com/Letting-the-Mapper-handle-multiple-lines.-tp23873214p23875177.html Sent from the Hadoop core-user mailing list archive at Nabble.com.