Re: Can't achieve load distribution

Harsh J Thu, 02 Feb 2012 19:56:52 -0800

New API NLineInputFormat is only available from 1.0.1, and not in any
of the earlier 1 (1.0.0) or 0.20 (0.20.x, 0.20.xxx) vanilla Apache
releases.


On Fri, Feb 3, 2012 at 7:08 AM, Praveen Sripati
<[email protected]> wrote:
> Mark,
>
> NLineInputFormat was not something which was introduced in 0.21, I have
> just sent the reference to the 0.21 url FYI. It's in 0.20.205, 1.0.0 and
> 0.23 releases also.
>
> Praveen
>
> On Fri, Feb 3, 2012 at 1:25 AM, Mark Kerzner <[email protected]>wrote:
>
>> Praveen,
>>
>> this seems just like the right thing, but it's API 0.21 (I googled about
>> the problems with it), so I have to use either the next Cloudera release,
>> or Hortonworks, or something, am I right?
>>
>> Mark
>>
>> On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati <[email protected]
>> >wrote:
>>
>> > > I have a simple MR job, and I want each Mapper to get one line from my
>> > input file (which contains further instructions for lengthy processing).
>> >
>> > Use the NLineInputFormat class.
>> >
>> >
>> >
>> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html
>> >
>> > Praveen
>> >
>> > On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner <[email protected]
>> > >wrote:
>> >
>> > > Thanks!
>> > > Mark
>> > >
>> > > On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta <[email protected]>
>> > wrote:
>> > >
>> > > > Yes, if ur block size is 64mb. Btw, block size is configurable in
>> > Hadoop.
>> > > >
>> > > > Best Regards,
>> > > > Anil
>> > > >
>> > > > On Feb 1, 2012, at 5:06 PM, Mark Kerzner <[email protected]>
>> > > wrote:
>> > > >
>> > > > > Anil,
>> > > > >
>> > > > > do you mean one block of HDFS, like 64MB?
>> > > > >
>> > > > > Mark
>> > > > >
>> > > > > On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta <[email protected]>
>> > > > wrote:
>> > > > >
>> > > > >> Do u have enough data to start more than one mapper?
>> > > > >> If entire data is less than a block size then only 1 mapper will
>> > run.
>> > > > >>
>> > > > >> Best Regards,
>> > > > >> Anil
>> > > > >>
>> > > > >> On Feb 1, 2012, at 4:21 PM, Mark Kerzner <
>> [email protected]>
>> > > > wrote:
>> > > > >>
>> > > > >>> Hi,
>> > > > >>>
>> > > > >>> I have a simple MR job, and I want each Mapper to get one line
>> from
>> > > my
>> > > > >>> input file (which contains further instructions for lengthy
>> > > > processing).
>> > > > >>> Each line is 100 characters long, and I tell Hadoop to read only
>> > 100
>> > > > >> bytes,
>> > > > >>>
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength",
>> > > > >>> 100);
>> > > > >>>
>> > > > >>> I see that this part works - it reads only one line at a time,
>> and
>> > > if I
>> > > > >>> change this parameter, it listens.
>> > > > >>>
>> > > > >>> However, on a cluster only one node receives all the map tasks.
>> > Only
>> > > > one
>> > > > >>> map tasks is started. The others never get anything, they just
>> > wait.
>> > > > I've
>> > > > >>> added 100 seconds wait to the mapper - no change!
>> > > > >>>
>> > > > >>> Any advice?
>> > > > >>>
>> > > > >>> Thank you. Sincerely,
>> > > > >>> Mark
>> > > > >>
>> > > >
>> > >
>> >
>>



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about

Re: Can't achieve load distribution

Reply via email to