After specifying NLineInputFormat option, streaming job fails with
Error from attempt_201205171448_0092_m_000000_0: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2 It spawns two mappers, but i am not sure whether the mapper runs with file names specified in the input option. I was expecting one mapper to run with /user/devi/s_input/a.txt and one mapper to run with /user/devi/s_input/b.txt. I digged into the task files, but could not find anything. Here is the simple mapper perl script .All does is it reads the file and prints it. (It needs to do much more stuff, but I could not get the basic job itself to run). $i = 0; $userinput = <STDIN>; open(INFILE,"$userinput") || die "could not open the file $userinput \n"; while (<INFILE>) { my $line = $_; print "$i".$line ; $i++; } close(INFILE); exit; My command is hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar -input /user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl /home/devi/Perl/crash_parser.pl" -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat Really appreciate your help. Devi ________________________________ From: Robert Evans <ev...@yahoo-inc.com> To: "mapreduce-u...@hadoop.apache.org" <mapreduce-u...@hadoop.apache.org>; "common-user@hadoop.apache.org" <common-user@hadoop.apache.org> Sent: Thu, August 2, 2012 1:16:54 PM Subject: Re: Issue with Hadoop Streaming http://www.mail-archive.com/core-user@hadoop.apache.org/msg07382.html From: Devi Kumarappan <kpala...@att.net> Reply-To: "mapreduce-u...@hadoop.apache.org" <mapreduce-u...@hadoop.apache.org> Date: Thursday, August 2, 2012 3:03 PM To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>, "mapreduce-u...@hadoop.apache.org" <mapreduce-u...@hadoop.apache.org> Subject: Re: Issue with Hadoop Streaming My mapper is perl script and it is not in Java.So how do I specify the NLineFormat? ________________________________ From: Robert Evans <ev...@yahoo-inc.com> To: "mapreduce-u...@hadoop.apache.org" <mapreduce-u...@hadoop.apache.org>; "common-user@hadoop.apache.org" <common-user@hadoop.apache.org> Sent: Thu, August 2, 2012 12:59:50 PM Subject: Re: Issue with Hadoop Streaming It depends on the input format you use. You probably want to look at using NLineInputFormat From: Devi Kumarappan <kpala...@att.net<mailto:kpala...@att.net>> Reply-To: "mapreduce-u...@hadoop.apache.org<mailto:mapreduce-u...@hadoop.apache.org>" <mapreduce-u...@hadoop.apache.org<mailto:mapreduce-u...@hadoop.apache.org>> Date: Wednesday, August 1, 2012 8:09 PM To: "common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>" <common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>>, "mapreduce-u...@hadoop.apache.org<mailto:mapreduce-u...@hadoop.apache.org>" <mapreduce-u...@hadoop.apache.org<mailto:mapreduce-u...@hadoop.apache.org>> Subject: Issue with Hadoop Streaming I am trying to run hadoop streaming using perl script as the mapper and with no reducer. My requirement is for the Mapper to run on one file at a time. since I have to do pattern processing in the entire contents of one file at a time and the file size is small. Hadoop streaming manual suggests the following solution * Generate a file containing the full HDFS path of the input files. Each map task would get one file name as input. * Create a mapper script which, given a filename, will get the file to local disk, gzip the file and put it back in the desired output directory. I am running the fllowing command. hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar -input /user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl /home/devi/Perl/crash_parser.pl" /user/devi/file.txt contains the following two lines. /user/devi/s_input/a.txt /user/devi/s_input/b.txt When this runs, instead of spawing two mappers for a.txt and b.txt as per the document, only one mapper is being spawned and the perl script gets the /user/devi/s_input/a.txt and /user/devi/s_input/b.txt as the inputs. How could I make the mapper perl script to run using only one file at a time ? Appreciate your help, Thanks, Devi