You're scenario matches the capability of NLineInputFormat exactly, so that 
looks to be the best solution. If you wrote your own input format, it would 
have to basically do what NLineInputFormat is already doing for you.

-----Original Message-----
From: maha [mailto:[email protected]] 
Sent: Sunday, February 20, 2011 2:00 PM
To: [email protected]
Subject: Re: Quick question

Actually the following solved my problem ... but I'm a little suspicious of the 
side effect of doing the following instead of using my own InputSplit to be 5 
lines.

 conf.setInputFormat(org.apache.hadoop.mapred.lib.NLineInputFormat.class); // # 
of maps = # lines  conf.setInt("mapred.line.input.format.linespermap", 5); //# 
of lines per mapper = 5

If you have any thought of whether the upper solution is worst that writing my 
own inputSplit to be about 5 lines, let me know.

Thanks everyone !

Maha
            
On Feb 20, 2011, at 11:47 AM, maha wrote:

> Hi again Jim and Ted,
> 
> I understood that each mapper will be getting a block of lines... but even 
> thought I had only 2 mappers for a 16 lines of input file and TextInputFormat 
> is used. A map-function is processed for each of those 16 lines!
> 
> I wanted a block of lines per map ... hence something like map1 has 8 lines 
> and map2 has 8 lines. 
> 
> So first question: is there a difference between Mappers and maps ?
> 
> Second: Does that mean I need to write my own inputFormat to make the 
> InputSplit equal to multipleLines ???
> 
> Thank you,
> 
> Maha
> 
> 
> On Feb 18, 2011, at 11:55 AM, Jim Falgout wrote:
> 
>> That's right. The TextInputFormat handles situations where records cross 
>> split boundaries. What your mapper will see is "whole" records. 
>> 
>> -----Original Message-----
>> From: maha [mailto:[email protected]]
>> Sent: Friday, February 18, 2011 1:14 PM
>> To: common-user
>> Subject: Quick question
>> 
>> Hi all,
>> 
>> I want to check if the following statement is right:
>> 
>> If I use TextInputFormat to process a text file with 2000 lines (each ending 
>> with \n) with 20 mappers. Then each map will have a sequence of COMPLETE 
>> LINES . 
>> 
>> In other words,  the input is not split byte-wise but by lines. 
>> 
>> Is that right?
>> 
>> 
>> Thank you,
>> Maha
>> 
> 


Reply via email to