Re: utilizing all cores on single-node hadoop

Jason Venner Wed, 19 Aug 2009 07:09:50 -0700

Another reason you may not see full utilization of your map tasks per
tracker is if the mean run time of a task is very short, All the slots are
being used but the setup and teardown for each task is large enough in time
compared to the run time of the task that it appears that not all the task
slots are being used.



On Mon, Aug 17, 2009 at 10:35 PM, Amogh Vasekar <am...@yahoo-inc.com> wrote:

> While setting mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum, please consider the memory usage
> your application might have since all tasks will be competing for the same
> and might reduce overall performance.
>
> Thanks,
> Amogh
> -----Original Message-----
> From: Harish Mallipeddi [mailto:harish.mallipe...@gmail.com]
> Sent: Tuesday, August 18, 2009 10:37 AM
> To: common-user@hadoop.apache.org
> Subject: Re: utilizing all cores on single-node hadoop
>
> Hi Vasilis,
>
> Here's some info that I know:
>
> mapred.map.tasks - this is a job-specific setting. This is just a hint to
> InputFormat as to how many InputSplits (and hence MapTasks) you want for
> your job. The default InputFormat classes usually keep each split size to
> the HDFS block size (64MB default). So if your input data is less than 64
> MB, it will just result in only 1 split and hence 1 MapTask only.
>
> mapred.reduce.tasks - this is also a job-specific setting.
>
> mapred.tasktracker.map.tasks.maximum
> mapred.tasktracker.reduce.tasks.maximum
>
> The above 2 are tasktracker-specific config options and determine how many
> "simultaneous" MapTasks and ReduceTasks run on each TT. Ideally on a 8-core
> box, you would want to set map.tasks.maximum to something like 6 and
> reduce.tasks.maximum to 4 to utilize all the 8 cores to the maximum
> (there's
> a little bit of over-subscription to account for tasks idling while doing
> I/O).
>
> In the web admin console, how many map-tasks and reduce-tasks are reported
> to have been launched for your job?
>
> Cheers,
> Harish
>
> On Tue, Aug 18, 2009 at 5:47 AM, Vasilis Liaskovitis <vlias...@gmail.com
> >wrote:
>
> > Hi,
> >
> > I am a beginner trying to setup a few simple hadoop tests on a single
> > node before moving on to a cluster. I am just using the simple
> > wordcount example for now. My question is what's the best way to
> > guarantee utilization of all cores on a single-node? So assuming a
> > single node with 16-cores what are the suggested values for:
> >
> > mapred.map.tasks
> > mapred.reduce.tasks
> >
> mapred.tasktracker.map.tasks.maximum
> > mapred.tasktracker.map.tasks.maxium
> >
>
> > I found an old similar thread
> > http://www.mail-archive.com/hadoop-u...@lucene.apache.org/msg00152.html
> > and I have followed similar settings for my 16-core system (e.g.
> > map.tasks=reduce.tasks=90 and map.tasks.maximum=100), however I always
> > see only 3-4 cores utilized using top.
> >
> > - The description for mapred.map.tasks says "Ignored when
> > mapred.job.tracker is "local" ", and in my case
> > mapred.job.tracker=hdfs://localhost:54311
> > is it possible that the map.tasks and reduce.tasks I am setting are
> > being ignored? How can I verify this? Is there a way to enforce my
> > values even on a localhost scenario like this?
> >
> > - Are there other config options/values that I need to set besides the
> > 4 I mentioned above?
> >
> > - Also is it possible that for short tasks, I won't see full
> > utilization of all cores anyway? Something along those lines is
> > mentioned in an issue a year ago:
> > http://issues.apache.org/jira/browse/HADOOP-3136
> > "If the individual tasks are very short i.e. run for less than the
> > heartbeat interval the TaskTracker serially runs one task at a time"
> >
> > I am using hadoop-0.19.2
> >
> > thanks for any guidance,
> >
> > - Vasilis
> >
>
>
>
> --
> Harish Mallipeddi
> http://blog.poundbang.in
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: utilizing all cores on single-node hadoop

Reply via email to