My reading on Capacity Scheduling is that it controls the number of jobs
scheduled at the level of the cluster.
My issue is not sharing at the level of the cluster - usually my job is the
only one running but rather at the level of
the individual machine.
  Some of my jobs require more memory and do significant processing -
especially in the reducer - While the cluster can schedule 8 smaller jobs
on a node when, say, 8  of the larger ones are scheduled slaves run out of
swap space and tend to crash.
  It is not unclear that limiting the number of jobs on the cluster will
stop a scheduler from scheduling the maximum allowed jobs on any node.
  Even requesting multiple slots for a job affects the number of jobs
running on the cluster but not on any specific node.
  Am I wrong here? If I want, say only three of my jobs running on one node
does asking for enough slots to guarantee the total jobs is no more than 3
times the number of nodes guarantee this?
   My read is that the total running jobs might be throttled but not the
number per node.
  Perhaps a clever use of queues might help but I am not quite sure about
the details


On Thu, May 23, 2013 at 4:37 PM, Harsh J <ha...@cloudera.com> wrote:

> Your problem seems to surround available memory and over-subscription. If
> you're using a 0.20.x or 1.x version of Apache Hadoop, you probably want to
> use the CapacityScheduler to address this for you.
>
> I once detailed how-to, on a similar question here:
> http://search-hadoop.com/m/gnFs91yIg1e
>
>
> On Wed, May 22, 2013 at 2:55 PM, Steve Lewis <lordjoe2...@gmail.com>
> wrote:
>
> > I have a series of Hadoop jobs to run - one of my jobs requires larger
> than
> > standard memory
> > I allow the task to use 2GB of memory. When I run some of these jobs the
> > slave nodes are crashing because they run out of swap space. It is not
> that
> > s slave count not run one. or even 4  of these jobs but 8 stresses the
> > limits.
> >  I could cut the mapred.tasktracker.reduce.tasks.maximum for the entire
> > cluster but this cripples the whole cluster for one of many jobs.
> > It seems to be a very bad design
> > a) to allow the job tracker to keep assigning tasks to a slave that is
> > already getting low on memory
> > b) to allow the user to run jobs capable or crashing noeds on the cluster
> > c) not to allow the user to specify that some jobs need to be limited to
> a
> > lower value without requiring this limit for every job.
> >
> > Are there plans to fix this??
> >
> > --
> >
>
>
>
> --
> Harsh J
>



-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Reply via email to