Thanks for your answers Ted and Jim :) Maha
On Feb 21, 2011, at 6:41 AM, Jim Falgout wrote: > You're scenario matches the capability of NLineInputFormat exactly, so that > looks to be the best solution. If you wrote your own input format, it would > have to basically do what NLineInputFormat is already doing for you. > > -----Original Message----- > From: maha [mailto:[email protected]] > Sent: Sunday, February 20, 2011 2:00 PM > To: [email protected] > Subject: Re: Quick question > > Actually the following solved my problem ... but I'm a little suspicious of > the side effect of doing the following instead of using my own InputSplit to > be 5 lines. > > conf.setInputFormat(org.apache.hadoop.mapred.lib.NLineInputFormat.class); // > # of maps = # lines conf.setInt("mapred.line.input.format.linespermap", 5); > //# of lines per mapper = 5 > > If you have any thought of whether the upper solution is worst that writing > my own inputSplit to be about 5 lines, let me know. > > Thanks everyone ! > > Maha > > On Feb 20, 2011, at 11:47 AM, maha wrote: > >> Hi again Jim and Ted, >> >> I understood that each mapper will be getting a block of lines... but even >> thought I had only 2 mappers for a 16 lines of input file and >> TextInputFormat is used. A map-function is processed for each of those 16 >> lines! >> >> I wanted a block of lines per map ... hence something like map1 has 8 lines >> and map2 has 8 lines. >> >> So first question: is there a difference between Mappers and maps ? >> >> Second: Does that mean I need to write my own inputFormat to make the >> InputSplit equal to multipleLines ??? >> >> Thank you, >> >> Maha >> >> >> On Feb 18, 2011, at 11:55 AM, Jim Falgout wrote: >> >>> That's right. The TextInputFormat handles situations where records cross >>> split boundaries. What your mapper will see is "whole" records. >>> >>> -----Original Message----- >>> From: maha [mailto:[email protected]] >>> Sent: Friday, February 18, 2011 1:14 PM >>> To: common-user >>> Subject: Quick question >>> >>> Hi all, >>> >>> I want to check if the following statement is right: >>> >>> If I use TextInputFormat to process a text file with 2000 lines (each >>> ending with \n) with 20 mappers. Then each map will have a sequence of >>> COMPLETE LINES . >>> >>> In other words, the input is not split byte-wise but by lines. >>> >>> Is that right? >>> >>> >>> Thank you, >>> Maha >>> >> > >
