Re: Set number of mappers by the number of input lines for a single file?

Harsh J Sun, 20 May 2012 03:55:01 -0700

Biro,

0.20.2 did carry NLineInputFormat but in the older/stable (marked
deprecated, but was undeprecated subsequently) API package. See
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html
which does confirm that 0.20.2 carried it. For 0.20.2, I recommend
sticking to the mapred.* API package.


For the new API (mapreduce.* package) version, you can also grab the
source and include it with the license into your project (and follow
whatever is required in doing so) from here:
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/mapred/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java

Hope this helps.

On Sun, May 20, 2012 at 4:03 PM, biro lehel <lehel.b...@yahoo.com> wrote:
> Hello Harsh,
>
> Thanks for your answer. The problem is, that I'm using version 0.20.2, and, 
> as I checked, NLineInputFormat is not implemented here (at least I couldn't 
> find it). Switching to an other version would be kind of a big deal in my 
> infrastructure, since I'm using VM's deployed form images already 
> pre-configured with 0.20.2, so it is not an option at the moment.  What 
> should I do?
>
> Thanks,
> Lehel.
>
> --- On Sun, 5/20/12, Harsh J <ha...@cloudera.com> wrote:
>
> From: Harsh J <ha...@cloudera.com>
> Subject: Re: Set number of mappers by the number of input lines for a single 
> file?
> To: common-user@hadoop.apache.org
> Date: Sunday, May 20, 2012, 12:52 PM
>
> Lehel,
>
> You may use the NLineInputFormat with N=1:
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html
>
> On Sun, May 20, 2012 at 2:48 PM, biro lehel <lehel.b...@yahoo.com> wrote:
>> Dear all,
>>
>> I have one single input file, which contains, on every line, some 
>> hydrological calibration models (data). Each line of the file should be 
>> processed and then the output from every line written to another single 
>> output file.
>>
>> I understood that hadoop spawns mapper tasks with the same number as how 
>> many input files there are (meaning, in my case, a single mapper would be 
>> generated). However, I want that a mapper to be dealing with only a single 
>> line from my input file (nr. of mapper tasks =  number of lines in my file).
>>
>> What is the best way to obtain such behavior? How should I specify this to 
>> Hadoop?
>>
>> Any suggestions are more than welcome.
>>
>> Thank you,
>> Lehel.
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: Set number of mappers by the number of input lines for a single file?

Reply via email to