HI Ben,

I am running the set up now. Currently, spark is able to launch backend
executors as long as there are resources released by hadoop. But when
hadoop is busy, spark has to wait a while. This seems OK now, because the
priority of hadoop jobs in my use case is higher than spark.

Best.


Guodong


On Tue, Jun 4, 2013 at 6:07 AM, Benjamin Mahler
<[email protected]>wrote:

> You're right that we currently do not dynamically adjust the resources of
> the TaskTrackers launched by our Hadoop Scheduler. It's possible to do, but
> requires a more difficult design of the Hadoop framework, so we went with
> the easier route of launching statically sized TaskTrackers.
>
> Have you run the particular set up you're describing? It might be best to
> run it and see how it behaves first, if indeed you run into resource
> starvation issues.
>
> There are some allocator improvements on the horizon which will help
> enforce administrator specified allocation contraints for different
> frameworks running in a cluster.
>
> Ben
>
>
> On Thu, May 30, 2013 at 7:38 PM, 王国栋 <[email protected]> wrote:
>
> > Hi,
> >
> > I am reading the code about resource allocator in Mesos and trying to
> > understand it.
> >
> > Now, we have HierarchicalAllocatorProcess. From the code, I think each
> time
> > master will first offer resources to the framework whose "resource share"
> > is the smallest. But I have some question in my use case.
> >
> > We want to deploy hadoop-cdh3u5 and spark on the same cluster of nodes
> with
> > Mesos. Hadoop jobtracker will launch Mesos executors to replace
> > tasktrackers. But each executor may have several mapper slot and reduce
> > slot. So each executor may allocate the resources for the mapper/reducer
> > slots, even if there are not so many running tasks. And executor will not
> > exit unless all the tasks are finished.
> >
> > So, here comes the problem. If hadoop always have some submitted running
> > jobs. These jobs may not need so many resources which is allocated
> > by jobtracker. eg. each executor only runs one mapper. In this situation,
> > the spark jobs will not get enough resources.
> >
> > I think I can set mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum to a small number(eg: 1 or 2),
> and
> > each node may have multiple MesosExecutors for hadoop. But at this time,
> > the jobtracker will have to communicate with a lot of tasktrackers (this
> is
> > different from traditional hadoop cluster), I am afraid the jobtracker
> will
> > become overwhelmed.
> >
> > Any idea about this ? I am glad to know any idea or advice about this.
> >
> > Best
> >
> > Guodong
> >
>

Reply via email to