Hello Harsh, Thanks for your answer. The problem is, that I'm using version 0.20.2, and, as I checked, NLineInputFormat is not implemented here (at least I couldn't find it). Switching to an other version would be kind of a big deal in my infrastructure, since I'm using VM's deployed form images already pre-configured with 0.20.2, so it is not an option at the moment. What should I do?
Thanks, Lehel. --- On Sun, 5/20/12, Harsh J <ha...@cloudera.com> wrote: From: Harsh J <ha...@cloudera.com> Subject: Re: Set number of mappers by the number of input lines for a single file? To: common-user@hadoop.apache.org Date: Sunday, May 20, 2012, 12:52 PM Lehel, You may use the NLineInputFormat with N=1: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html On Sun, May 20, 2012 at 2:48 PM, biro lehel <lehel.b...@yahoo.com> wrote: > Dear all, > > I have one single input file, which contains, on every line, some > hydrological calibration models (data). Each line of the file should be > processed and then the output from every line written to another single > output file. > > I understood that hadoop spawns mapper tasks with the same number as how many > input files there are (meaning, in my case, a single mapper would be > generated). However, I want that a mapper to be dealing with only a single > line from my input file (nr. of mapper tasks = number of lines in my file). > > What is the best way to obtain such behavior? How should I specify this to > Hadoop? > > Any suggestions are more than welcome. > > Thank you, > Lehel. -- Harsh J