Re: Quick question

maha Mon, 21 Feb 2011 07:54:13 -0800

Thanks for your answers Ted and Jim :)

Maha


On Feb 21, 2011, at 6:41 AM, Jim Falgout wrote:

> You're scenario matches the capability of NLineInputFormat exactly, so that 
> looks to be the best solution. If you wrote your own input format, it would 
> have to basically do what NLineInputFormat is already doing for you.
> 
> -----Original Message-----
> From: maha [mailto:[email protected]] 
> Sent: Sunday, February 20, 2011 2:00 PM
> To: [email protected]
> Subject: Re: Quick question
> 
> Actually the following solved my problem ... but I'm a little suspicious of 
> the side effect of doing the following instead of using my own InputSplit to 
> be 5 lines.
> 
> conf.setInputFormat(org.apache.hadoop.mapred.lib.NLineInputFormat.class); // 
> # of maps = # lines  conf.setInt("mapred.line.input.format.linespermap", 5); 
> //# of lines per mapper = 5
> 
> If you have any thought of whether the upper solution is worst that writing 
> my own inputSplit to be about 5 lines, let me know.
> 
> Thanks everyone !
> 
> Maha
>           
> On Feb 20, 2011, at 11:47 AM, maha wrote:
> 
>> Hi again Jim and Ted,
>> 
>> I understood that each mapper will be getting a block of lines... but even 
>> thought I had only 2 mappers for a 16 lines of input file and 
>> TextInputFormat is used. A map-function is processed for each of those 16 
>> lines!
>> 
>> I wanted a block of lines per map ... hence something like map1 has 8 lines 
>> and map2 has 8 lines. 
>> 
>> So first question: is there a difference between Mappers and maps ?
>> 
>> Second: Does that mean I need to write my own inputFormat to make the 
>> InputSplit equal to multipleLines ???
>> 
>> Thank you,
>> 
>> Maha
>> 
>> 
>> On Feb 18, 2011, at 11:55 AM, Jim Falgout wrote:
>> 
>>> That's right. The TextInputFormat handles situations where records cross 
>>> split boundaries. What your mapper will see is "whole" records. 
>>> 
>>> -----Original Message-----
>>> From: maha [mailto:[email protected]]
>>> Sent: Friday, February 18, 2011 1:14 PM
>>> To: common-user
>>> Subject: Quick question
>>> 
>>> Hi all,
>>> 
>>> I want to check if the following statement is right:
>>> 
>>> If I use TextInputFormat to process a text file with 2000 lines (each 
>>> ending with \n) with 20 mappers. Then each map will have a sequence of 
>>> COMPLETE LINES . 
>>> 
>>> In other words,  the input is not split byte-wise but by lines. 
>>> 
>>> Is that right?
>>> 
>>> 
>>> Thank you,
>>> Maha
>>> 
>> 
> 
>

Re: Quick question

Reply via email to