Re: Set number of mappers by the number of input lines for a single file?

biro lehel Sun, 20 May 2012 03:34:05 -0700

Hello Harsh,

Thanks for your answer. The problem is, that I'm using version 0.20.2, and, as 
I checked, NLineInputFormat is not implemented here (at least I couldn't find 
it). Switching to an other version would be kind of a big deal in my 
infrastructure, since I'm using VM's deployed form images already 
pre-configured with 0.20.2, so it is not an option at the moment.  What should 
I do?


Thanks, 
Lehel.

--- On Sun, 5/20/12, Harsh J <ha...@cloudera.com> wrote:

From: Harsh J <ha...@cloudera.com>
Subject: Re: Set number of mappers by the number of input lines for a single 
file?
To: common-user@hadoop.apache.org
Date: Sunday, May 20, 2012, 12:52 PM

Lehel,

You may use the NLineInputFormat with N=1:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html

On Sun, May 20, 2012 at 2:48 PM, biro lehel <lehel.b...@yahoo.com> wrote:
> Dear all,
>
> I have one single input file, which contains, on every line, some 
> hydrological calibration models (data). Each line of the file should be 
> processed and then the output from every line written to another single 
> output file.
>
> I understood that hadoop spawns mapper tasks with the same number as how many 
> input files there are (meaning, in my case, a single mapper would be 
> generated). However, I want that a mapper to be dealing with only a single 
> line from my input file (nr. of mapper tasks =  number of lines in my file).
>
> What is the best way to obtain such behavior? How should I specify this to 
> Hadoop?
>
> Any suggestions are more than welcome.
>
> Thank you,
> Lehel.



-- 
Harsh J

Re: Set number of mappers by the number of input lines for a single file?

Reply via email to