After specifying NLineInputFormat option, streaming job fails with Error from attempt_201205171448_0092_m_000000_0: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
It spawns two mappers, but i am not sure whether the mapper runs with file names specified in the input option. I was expecting one mapper to run with /user/devi/s_input/a.txt and one mapper to run with /user/devi/s_input/b.txt. I digged into the task files, but could not find anything. Here is the simple mapper perl script .All does is it reads the file and prints it. (It needs to do much more stuff, but I could not get the basic job itself to run). $i = 0; $userinput = <STDIN>; open(INFILE,"$userinput") || die "could not open the file $userinput \n"; while (<INFILE>) { my $line = $_; print "$i".$line ; $i++; } close(INFILE); exit; My command is hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar -input /user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl /home/devi/Perl/crash_parser.pl" -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat Really appreciate your help. Devi ________________________________ From: Robert Evans <ev...@yahoo-inc.com> To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org> Sent: Thu, August 2, 2012 1:15:05 PM Subject: Re: Hadoop : java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text The default text input format has a key of a LongWritable that is the offset into the file. The value is the full line. On 8/2/12 2:59 PM, "Harit Himanshu" <harit.himan...@gmail.com> wrote: >StackOverflow link - >http://stackoverflow.com/questions/11784729/hadoop-java-lang-classcastexce >ption-org-apache-hadoop-io-longwritable-cannot > >---------- > >My program looks like > >public class TopKRecord extends Configured implements Tool { > > public static class MapClass extends Mapper<Text, Text, Text, Text> { > > public void map(Text key, Text value, Context context) throws >IOException, InterruptedException { > // your map code goes here > String[] fields = value.toString().split(","); > String year = fields[1]; > String claims = fields[8]; > > if (claims.length() > 0 && (!claims.startsWith("\""))) { > context.write(new Text(year.toString()), new >Text(claims.toString())); > } > } > } > public int run(String args[]) throws Exception { > Job job = new Job(); > job.setJarByClass(TopKRecord.class); > > job.setMapperClass(MapClass.class); > > FileInputFormat.setInputPaths(job, new Path(args[0])); > FileOutputFormat.setOutputPath(job, new Path(args[1])); > > job.setJobName("TopKRecord"); > job.setMapOutputValueClass(Text.class); > job.setNumReduceTasks(0); > boolean success = job.waitForCompletion(true); > return success ? 0 : 1; > } > > public static void main(String args[]) throws Exception { > int ret = ToolRunner.run(new TopKRecord(), args); > System.exit(ret); > } >} > >The data looks like > >"PATENT","GYEAR","GDATE","APPYEAR","COUNTRY","POSTATE","ASSIGNEE","ASSCODE >","CLAIMS","NCLASS","CAT","SUBCAT","CMADE","CRECEIVE","RATIOCIT","GENERAL" >,"ORIGINAL","FWDAPLAG","BCKGTLAG","SELFCTUB","SELFCTLB","SECDUPBD","SECDLW >BD" >3070801,1963,1096,,"BE","",,1,,269,6,69,,1,,0,,,,,,, >3070802,1963,1096,,"US","TX",,1,,2,6,63,,0,,,,,,,,, >3070803,1963,1096,,"US","IL",,1,,2,6,63,,9,,0.3704,,,,,,, >3070804,1963,1096,,"US","OH",,1,,2,6,63,,3,,0.6667,,,,,,, > >On running this program I see the following on console > >12/08/02 12:43:34 INFO mapred.JobClient: Task Id : >attempt_201208021025_0007_m_000000_0, Status : FAILED >java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot >be cast to org.apache.hadoop.io.Text > at com.hadoop.programs.TopKRecord$MapClass.map(TopKRecord.java:26) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. >java:1121) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > >I believe that the Class Types are mapped correctly, Class >Mapper<http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/ >mapreduce/Mapper.html> >, > >Please let me know what is that I am doing wrong here? > > >Thank you > >+ Harit Himanshu