http://www.mail-archive.com/core-user@hadoop.apache.org/msg07382.html
From: Devi Kumarappan <kpala...@att.net<mailto:kpala...@att.net>> Reply-To: "mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>" <mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>> Date: Thursday, August 2, 2012 3:03 PM To: "common-u...@hadoop.apache.org<mailto:common-u...@hadoop.apache.org>" <common-u...@hadoop.apache.org<mailto:common-u...@hadoop.apache.org>>, "mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>" <mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>> Subject: Re: Issue with Hadoop Streaming My mapper is perl script and it is not in Java.So how do I specify the NLineFormat? ________________________________ From: Robert Evans <ev...@yahoo-inc.com<mailto:ev...@yahoo-inc.com>> To: "mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>" <mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>; "common-u...@hadoop.apache.org<mailto:common-u...@hadoop.apache.org>" <common-u...@hadoop.apache.org<mailto:common-u...@hadoop.apache.org>> Sent: Thu, August 2, 2012 12:59:50 PM Subject: Re: Issue with Hadoop Streaming It depends on the input format you use. You probably want to look at using NLineInputFormat From: Devi Kumarappan <kpala...@att.net<mailto:kpala...@att.net><mailto:kpala...@att.net<mailto:kpala...@att.net>>> Reply-To: "mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org><mailto:mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>" <mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org><mailto:mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>> Date: Wednesday, August 1, 2012 8:09 PM To: "common-u...@hadoop.apache.org<mailto:common-u...@hadoop.apache.org><mailto:common-u...@hadoop.apache.org<mailto:common-u...@hadoop.apache.org>>" <common-u...@hadoop.apache.org<mailto:common-u...@hadoop.apache.org><mailto:common-u...@hadoop.apache.org<mailto:common-u...@hadoop.apache.org>>>, "mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org><mailto:mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>" <mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org><mailto:mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>> Subject: Issue with Hadoop Streaming I am trying to run hadoop streaming using perl script as the mapper and with no reducer. My requirement is for the Mapper to run on one file at a time. since I have to do pattern processing in the entire contents of one file at a time and the file size is small. Hadoop streaming manual suggests the following solution * Generate a file containing the full HDFS path of the input files. Each map task would get one file name as input. * Create a mapper script which, given a filename, will get the file to local disk, gzip the file and put it back in the desired output directory. I am running the fllowing command. hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar -input /user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl /home/devi/Perl/crash_parser.pl" /user/devi/file.txt contains the following two lines. /user/devi/s_input/a.txt /user/devi/s_input/b.txt When this runs, instead of spawing two mappers for a.txt and b.txt as per the document, only one mapper is being spawned and the perl script gets the /user/devi/s_input/a.txt and /user/devi/s_input/b.txt as the inputs. How could I make the mapper perl script to run using only one file at a time ? Appreciate your help, Thanks, Devi