New API NLineInputFormat is only available from 1.0.1, and not in any of the earlier 1 (1.0.0) or 0.20 (0.20.x, 0.20.xxx) vanilla Apache releases.
On Fri, Feb 3, 2012 at 7:08 AM, Praveen Sripati <[email protected]> wrote: > Mark, > > NLineInputFormat was not something which was introduced in 0.21, I have > just sent the reference to the 0.21 url FYI. It's in 0.20.205, 1.0.0 and > 0.23 releases also. > > Praveen > > On Fri, Feb 3, 2012 at 1:25 AM, Mark Kerzner <[email protected]>wrote: > >> Praveen, >> >> this seems just like the right thing, but it's API 0.21 (I googled about >> the problems with it), so I have to use either the next Cloudera release, >> or Hortonworks, or something, am I right? >> >> Mark >> >> On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati <[email protected] >> >wrote: >> >> > > I have a simple MR job, and I want each Mapper to get one line from my >> > input file (which contains further instructions for lengthy processing). >> > >> > Use the NLineInputFormat class. >> > >> > >> > >> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html >> > >> > Praveen >> > >> > On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner <[email protected] >> > >wrote: >> > >> > > Thanks! >> > > Mark >> > > >> > > On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta <[email protected]> >> > wrote: >> > > >> > > > Yes, if ur block size is 64mb. Btw, block size is configurable in >> > Hadoop. >> > > > >> > > > Best Regards, >> > > > Anil >> > > > >> > > > On Feb 1, 2012, at 5:06 PM, Mark Kerzner <[email protected]> >> > > wrote: >> > > > >> > > > > Anil, >> > > > > >> > > > > do you mean one block of HDFS, like 64MB? >> > > > > >> > > > > Mark >> > > > > >> > > > > On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta <[email protected]> >> > > > wrote: >> > > > > >> > > > >> Do u have enough data to start more than one mapper? >> > > > >> If entire data is less than a block size then only 1 mapper will >> > run. >> > > > >> >> > > > >> Best Regards, >> > > > >> Anil >> > > > >> >> > > > >> On Feb 1, 2012, at 4:21 PM, Mark Kerzner < >> [email protected]> >> > > > wrote: >> > > > >> >> > > > >>> Hi, >> > > > >>> >> > > > >>> I have a simple MR job, and I want each Mapper to get one line >> from >> > > my >> > > > >>> input file (which contains further instructions for lengthy >> > > > processing). >> > > > >>> Each line is 100 characters long, and I tell Hadoop to read only >> > 100 >> > > > >> bytes, >> > > > >>> >> > > > >>> >> > > > >> >> > > > >> > > >> > >> job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength", >> > > > >>> 100); >> > > > >>> >> > > > >>> I see that this part works - it reads only one line at a time, >> and >> > > if I >> > > > >>> change this parameter, it listens. >> > > > >>> >> > > > >>> However, on a cluster only one node receives all the map tasks. >> > Only >> > > > one >> > > > >>> map tasks is started. The others never get anything, they just >> > wait. >> > > > I've >> > > > >>> added 100 seconds wait to the mapper - no change! >> > > > >>> >> > > > >>> Any advice? >> > > > >>> >> > > > >>> Thank you. Sincerely, >> > > > >>> Mark >> > > > >> >> > > > >> > > >> > >> -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
