Yet the map-function was processed 16 times as described by the
NLineInputSplit. I want the map-function to be one for the whole inputSplit
of 5 Lines and not for each of the 16 lines.
Any ideas other than building my own inputFormat?
Thank you,
Maha
On Feb 20, 2011, at 11:59 AM, maha wrote:
> Actually the following solved my problem ... but I'm a little suspicious of
> the side effect of doing the following instead of using my own InputSplit to
> be 5 lines.
>
> conf.setInputFormat(org.apache.hadoop.mapred.lib.NLineInputFormat.class); //
> # of maps = # lines
> conf.setInt("mapred.line.input.format.linespermap", 5); //# of lines per
> mapper = 5
>
> If you have any thought of whether the upper solution is worst that writing
> my own inputSplit to be about 5 lines, let me know.
>
> Thanks everyone !
>
> Maha
>
> On Feb 20, 2011, at 11:47 AM, maha wrote:
>
>> Hi again Jim and Ted,
>>
>> I understood that each mapper will be getting a block of lines... but even
>> thought I had only 2 mappers for a 16 lines of input file and
>> TextInputFormat is used. A map-function is processed for each of those 16
>> lines!
>>
>> I wanted a block of lines per map ... hence something like map1 has 8 lines
>> and map2 has 8 lines.
>>
>> So first question: is there a difference between Mappers and maps ?
>>
>> Second: Does that mean I need to write my own inputFormat to make the
>> InputSplit equal to multipleLines ???
>>
>> Thank you,
>>
>> Maha
>>
>>
>> On Feb 18, 2011, at 11:55 AM, Jim Falgout wrote:
>>
>>> That's right. The TextInputFormat handles situations where records cross
>>> split boundaries. What your mapper will see is "whole" records.
>>>
>>> -----Original Message-----
>>> From: maha [mailto:[email protected]]
>>> Sent: Friday, February 18, 2011 1:14 PM
>>> To: common-user
>>> Subject: Quick question
>>>
>>> Hi all,
>>>
>>> I want to check if the following statement is right:
>>>
>>> If I use TextInputFormat to process a text file with 2000 lines (each
>>> ending with \n) with 20 mappers. Then each map will have a sequence of
>>> COMPLETE LINES .
>>>
>>> In other words, the input is not split byte-wise but by lines.
>>>
>>> Is that right?
>>>
>>>
>>> Thank you,
>>> Maha
>>>
>>
>