Actually the following solved my problem ... but I'm a little suspicious of the 
side effect of doing the following instead of using my own InputSplit to be 5 
lines.

 conf.setInputFormat(org.apache.hadoop.mapred.lib.NLineInputFormat.class); // # 
of maps = # lines
 conf.setInt("mapred.line.input.format.linespermap", 5); //# of lines per 
mapper = 5

If you have any thought of whether the upper solution is worst that writing my 
own inputSplit to be about 5 lines, let me know.

Thanks everyone !

Maha
            
On Feb 20, 2011, at 11:47 AM, maha wrote:

> Hi again Jim and Ted,
> 
> I understood that each mapper will be getting a block of lines... but even 
> thought I had only 2 mappers for a 16 lines of input file and TextInputFormat 
> is used. A map-function is processed for each of those 16 lines!
> 
> I wanted a block of lines per map ... hence something like map1 has 8 lines 
> and map2 has 8 lines. 
> 
> So first question: is there a difference between Mappers and maps ?
> 
> Second: Does that mean I need to write my own inputFormat to make the 
> InputSplit equal to multipleLines ???
> 
> Thank you,
> 
> Maha
> 
> 
> On Feb 18, 2011, at 11:55 AM, Jim Falgout wrote:
> 
>> That's right. The TextInputFormat handles situations where records cross 
>> split boundaries. What your mapper will see is "whole" records. 
>> 
>> -----Original Message-----
>> From: maha [mailto:[email protected]] 
>> Sent: Friday, February 18, 2011 1:14 PM
>> To: common-user
>> Subject: Quick question
>> 
>> Hi all,
>> 
>> I want to check if the following statement is right:
>> 
>> If I use TextInputFormat to process a text file with 2000 lines (each ending 
>> with \n) with 20 mappers. Then each map will have a sequence of COMPLETE 
>> LINES . 
>> 
>> In other words,  the input is not split byte-wise but by lines. 
>> 
>> Is that right?
>> 
>> 
>> Thank you,
>> Maha
>> 
> 

Reply via email to