Yet the map-function was processed 16 times as described by the 
NLineInputSplit.   I want the map-function to be one for the whole inputSplit 
of 5 Lines and not for each of the 16 lines.

Any ideas other than building my own inputFormat?

Thank you,

Maha
 
On Feb 20, 2011, at 11:59 AM, maha wrote:

> Actually the following solved my problem ... but I'm a little suspicious of 
> the side effect of doing the following instead of using my own InputSplit to 
> be 5 lines.
> 
> conf.setInputFormat(org.apache.hadoop.mapred.lib.NLineInputFormat.class); // 
> # of maps = # lines
> conf.setInt("mapred.line.input.format.linespermap", 5); //# of lines per 
> mapper = 5
> 
> If you have any thought of whether the upper solution is worst that writing 
> my own inputSplit to be about 5 lines, let me know.
> 
> Thanks everyone !
> 
> Maha
>           
> On Feb 20, 2011, at 11:47 AM, maha wrote:
> 
>> Hi again Jim and Ted,
>> 
>> I understood that each mapper will be getting a block of lines... but even 
>> thought I had only 2 mappers for a 16 lines of input file and 
>> TextInputFormat is used. A map-function is processed for each of those 16 
>> lines!
>> 
>> I wanted a block of lines per map ... hence something like map1 has 8 lines 
>> and map2 has 8 lines. 
>> 
>> So first question: is there a difference between Mappers and maps ?
>> 
>> Second: Does that mean I need to write my own inputFormat to make the 
>> InputSplit equal to multipleLines ???
>> 
>> Thank you,
>> 
>> Maha
>> 
>> 
>> On Feb 18, 2011, at 11:55 AM, Jim Falgout wrote:
>> 
>>> That's right. The TextInputFormat handles situations where records cross 
>>> split boundaries. What your mapper will see is "whole" records. 
>>> 
>>> -----Original Message-----
>>> From: maha [mailto:[email protected]] 
>>> Sent: Friday, February 18, 2011 1:14 PM
>>> To: common-user
>>> Subject: Quick question
>>> 
>>> Hi all,
>>> 
>>> I want to check if the following statement is right:
>>> 
>>> If I use TextInputFormat to process a text file with 2000 lines (each 
>>> ending with \n) with 20 mappers. Then each map will have a sequence of 
>>> COMPLETE LINES . 
>>> 
>>> In other words,  the input is not split byte-wise but by lines. 
>>> 
>>> Is that right?
>>> 
>>> 
>>> Thank you,
>>> Maha
>>> 
>> 
> 

Reply via email to