And that is exactly what I found. I have a "hack" for now - give all files on the command line - and I will wait for the next release in some distribution.
Thank you, Mark On Thu, Feb 2, 2012 at 9:55 PM, Harsh J <[email protected]> wrote: > New API NLineInputFormat is only available from 1.0.1, and not in any > of the earlier 1 (1.0.0) or 0.20 (0.20.x, 0.20.xxx) vanilla Apache > releases. > > On Fri, Feb 3, 2012 at 7:08 AM, Praveen Sripati > <[email protected]> wrote: > > Mark, > > > > NLineInputFormat was not something which was introduced in 0.21, I have > > just sent the reference to the 0.21 url FYI. It's in 0.20.205, 1.0.0 and > > 0.23 releases also. > > > > Praveen > > > > On Fri, Feb 3, 2012 at 1:25 AM, Mark Kerzner <[email protected] > >wrote: > > > >> Praveen, > >> > >> this seems just like the right thing, but it's API 0.21 (I googled about > >> the problems with it), so I have to use either the next Cloudera > release, > >> or Hortonworks, or something, am I right? > >> > >> Mark > >> > >> On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati < > [email protected] > >> >wrote: > >> > >> > > I have a simple MR job, and I want each Mapper to get one line from > my > >> > input file (which contains further instructions for lengthy > processing). > >> > > >> > Use the NLineInputFormat class. > >> > > >> > > >> > > >> > http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html > >> > > >> > Praveen > >> > > >> > On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner < > [email protected] > >> > >wrote: > >> > > >> > > Thanks! > >> > > Mark > >> > > > >> > > On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta <[email protected]> > >> > wrote: > >> > > > >> > > > Yes, if ur block size is 64mb. Btw, block size is configurable in > >> > Hadoop. > >> > > > > >> > > > Best Regards, > >> > > > Anil > >> > > > > >> > > > On Feb 1, 2012, at 5:06 PM, Mark Kerzner < > [email protected]> > >> > > wrote: > >> > > > > >> > > > > Anil, > >> > > > > > >> > > > > do you mean one block of HDFS, like 64MB? > >> > > > > > >> > > > > Mark > >> > > > > > >> > > > > On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta < > [email protected]> > >> > > > wrote: > >> > > > > > >> > > > >> Do u have enough data to start more than one mapper? > >> > > > >> If entire data is less than a block size then only 1 mapper > will > >> > run. > >> > > > >> > >> > > > >> Best Regards, > >> > > > >> Anil > >> > > > >> > >> > > > >> On Feb 1, 2012, at 4:21 PM, Mark Kerzner < > >> [email protected]> > >> > > > wrote: > >> > > > >> > >> > > > >>> Hi, > >> > > > >>> > >> > > > >>> I have a simple MR job, and I want each Mapper to get one line > >> from > >> > > my > >> > > > >>> input file (which contains further instructions for lengthy > >> > > > processing). > >> > > > >>> Each line is 100 characters long, and I tell Hadoop to read > only > >> > 100 > >> > > > >> bytes, > >> > > > >>> > >> > > > >>> > >> > > > >> > >> > > > > >> > > > >> > > >> > job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength", > >> > > > >>> 100); > >> > > > >>> > >> > > > >>> I see that this part works - it reads only one line at a time, > >> and > >> > > if I > >> > > > >>> change this parameter, it listens. > >> > > > >>> > >> > > > >>> However, on a cluster only one node receives all the map > tasks. > >> > Only > >> > > > one > >> > > > >>> map tasks is started. The others never get anything, they just > >> > wait. > >> > > > I've > >> > > > >>> added 100 seconds wait to the mapper - no change! > >> > > > >>> > >> > > > >>> Any advice? > >> > > > >>> > >> > > > >>> Thank you. Sincerely, > >> > > > >>> Mark > >> > > > >> > >> > > > > >> > > > >> > > >> > > > > -- > Harsh J > Customer Ops. Engineer > Cloudera | http://tiny.cloudera.com/about >
