On Thu, Sep 4, 2014 at 5:55 PM, Suresh Marru <[email protected]> wrote:
> Eran, > > This is a good read and infact sounds very similar in situation (picking a > well known solution vs writing our own). > "As you may recollect, Airavata’s key challenge is in identifying the > resources which have the shortest queue time across many resources. " - > Well... to be precise Airavata needs to identify the resource which allows > user application to execute with a minimum time. Queue time is only one > factor which decides that. Resources accessible to community account is > another factor. There are more factors scheduler needs to take into > account; e.g. :- speed, memory, number of cores per node etc ... If you > want to make scheduler more interesting you can also consider parameters > such as job placement within nodes, network connectivity, NUMA pattern etc > ... But I think those are too much at least for initial version of the > scheduler. > Thanks -Amila > And of course, it will have use cases like re-using cloud resources for > individual jobs part of a larger workflow (a flavor of your thesis topic if > you still remember) and so on. So my question is, are Mesos or Aurora’s use > cases in managing a fixed set of resources, I mean the challenge in > spreading M jobs across N resources efficiently with fair-share, varying > memory and I/O requirements and so on? Or did you also come across examples > which will resonate with meta-schedulers interacting with multiple lower > level schedulers? > > Thanks, > Suresh > > On Sep 4, 2014, at 5:38 PM, Eran Chinthaka Withana < > [email protected]> wrote: > > > oops, sorry. Here it is: > > http://www.mail-archive.com/[email protected]/msg01417.html > > > > Thanks, > > Eran Chinthaka Withana > > > > > > On Thu, Sep 4, 2014 at 2:22 PM, Suresh Marru <[email protected]> wrote: > > > >> Hi Eran, Jijoe > >> > >> Can you share the missing reference you indicate below? > >> > >> Ofcourse by all means its good for Airavata to build over projects like > >> Mesos, thats my motivation for this discussion. I am not yet suggesting > >> implementing a scheduler, that will be a distraction. The meta > scheduler I > >> illustrated is a mere routing to be injected into airavata job > management > >> with a simple FIFO. We looking forward to hearing options from you all > on > >> whats the right third party software is. Manu Singh a first year > graduate > >> student at IU volunteers to do a academic study of these solutions, so > will > >> appreciate pointers. > >> > >> Suresh > >> > >> On Sep 3, 2014, at 11:59 AM, Eran Chinthaka Withana < > >> [email protected]> wrote: > >> > >>> Hi, > >>> > >>> Before you go ahead and implement on your own, consider reading this > mail > >>> thread[1] and looking at how frameworks like Apache Aurora does it on > top > >>> of Apache Mesos. These may provide good inputs for this implementation. > >>> > >>> (thanks to Jijoe also who provided input for this) > >>> > >>> > >>> > >>> Thanks, > >>> Eran Chinthaka Withana > >>> > >>> > >>> On Wed, Sep 3, 2014 at 5:50 AM, Suresh Marru <[email protected]> > wrote: > >>> > >>>> Thank you all for comments and suggestions. I summarized the > discussion > >> as > >>>> a implementation plan on a wiki page: > >>>> > >>>> > >> > https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Metascheduler > >>>> > >>>> If this is amenable, we can take this to dev list to plan the > >> development > >>>> in two phases. First implement the Throttle-Job in and short term and > >> then > >>>> plan the Auto-Scheduling capabilities. > >>>> > >>>> Suresh > >>>> > >>>> On Sep 2, 2014, at 1:50 PM, Gary E. Gorbet <[email protected]> > wrote: > >>>> > >>>>> It seems to me that among many possible functions a metascheduler > (MS) > >>>> would provide, there are two separate ones that must be addressed > first. > >>>> The two use cases implied are as follows. > >>>>> > >>>>> (1) The gateway submits a group of jobs to a specified resource where > >>>> the count of jobs exceeds the resource’s queued job limit. Let’s say > 300 > >>>> very quick jobs are submitted, where the limit is 50 per community > user. > >>>> The MS must maintain an internal queue and release jobs to the > resource > >> in > >>>> groups with job counts under the limit (say, 40 at a time). > >>>>> > >>>>> (2) The gateway submits a job or set of jobs with a flag that > specifies > >>>> that Airavata choose the resource. Here, MCP or some other mechanism > >>>> arrives eventually at the specific resource that completes the job(s). > >>>>> > >>>>> Where both uses are needed - unspecified resource and a group of jobs > >>>> with count exceeding limits - the MS action would be best defined by > >>>> knowing the definitions and mechanisms employed in the two separate > >>>> functions. For example, if MCP is employed, the initial brute force > test > >>>> submissions might need to be done using the determined number of jobs > >> at a > >>>> time (e.g., 40). But the design here must adhere to design criteria > >> arrived > >>>> at for both function (1) and function (2). > >>>>> > >>>>> In UltraScan’s case, the most immediate need is for (1). The user > could > >>>> manually determine the best resource or just make a reasonable guess. > >> What > >>>> the user does not want to do is manually release jobs 40 at a time. > The > >>>> gateway interface allows specification of a group of 300 jobs and the > >> user > >>>> does not care what is going on under the covers to effect the running > of > >>>> all of them eventually. So, I guess I am lobbying for addressing (1) > >> first; > >>>> both to meet UltraScan’s immediate need and to elucidate the design of > >> more > >>>> sophisticated functionality. > >>>>> > >>>>> - Gary > >>>>> > >>>>> On Sep 2, 2014, at 12:02 PM, Suresh Marru <[email protected]> wrote: > >>>>> > >>>>>> Hi Kenneth, > >>>>>> > >>>>>> On Sep 2, 2014, at 12:44 PM, K Yoshimoto <[email protected]> wrote: > >>>>>> > >>>>>>> > >>>>>>> The tricky thing is the need to maintain an internal queue of > >>>>>>> jobs when the Stampede queued jobs limit is reached. If airavata > >>>>>>> has an internal representation for jobs to be submitted, I think > you > >>>>>>> are most of the way there. > >>>>>> > >>>>>> Airavata has an internal representation of jobs, but there is no > good > >>>> global view of all the jobs running on a given resource for a given > >>>> community account. We are trying to fix this, once this is done, as > you > >>>> say, the FIFO implementation should be straight forward. > >>>>>> > >>>>>>> It is tricky to do resource-matching scheduling when the job mix > >>>>>>> is not known. For example, the scheduler does not know whether > >>>>>>> to preserve memory vs cores when deciding where to place a job. > >>>>>>> Also, the interactions of the gateway scheduler and the local > >>>>>>> schedulers may be complicated to predict. > >>>>>>> > >>>>>>> Fair share is probably not a good idea. In practice, it tends > >>>>>>> to disrupt the other scheduling policies such that one group is > >>>>>>> penalized and the others don't run much earlier. > >>>>>> > >>>>>> Interesting. What do you think of the capacity based scheduling > >>>> algorithm (linked below)? > >>>>>> > >>>>>>> > >>>>>>> One option is to maintain the gateway job queue internally, > >>>>>>> then use the MCP brute force approach: submit to all resources, > >>>>>>> then cancel after the first job start. You may also want to > >>>>>>> allow the gateway to set per-resource policy limits on > >>>>>>> number of jobs, job duration, job core size, SUs, etc. > >>>>>> > >>>>>> MCP is something we should try. The limits per gateway per resource > >>>> exists, but we need to exercise these capabilities. > >>>>>> > >>>>>> Suresh > >>>>>> > >>>>>>> > >>>>>>> On Tue, Sep 02, 2014 at 07:50:12AM -0400, Suresh Marru wrote: > >>>>>>>> Hi All, > >>>>>>>> > >>>>>>>> Need some guidance on identifying a scheduling strategy and a > >>>> pluggable third party implementation for airavata scheduling needs. > For > >>>> context let me describe the use cases for scheduling within airavata: > >>>>>>>> > >>>>>>>> * If we gateway/user is submitting a series of jobs, airavata is > >>>> currently not throttling them and sending them to compute clusters > (in a > >>>> FIFO way). Resources enforce per user job limit within a queue and > >> ensure > >>>> fair use of the clusters ((example: stampede allows 50 jobs per user > in > >> the > >>>> normal queue [1]). Airavata will need to implement queues and throttle > >> jobs > >>>> respecting the max-job-per-queue limits of a underlying resource > queue. > >>>>>>>> > >>>>>>>> * Current version of Airavata is also not performing job > scheduling > >>>> across available computational resources and expecting gateways/users > to > >>>> pick resources during experiment launch. Airavata will need to > implement > >>>> schedulers which become aware of existing loads on the clusters and > >> spread > >>>> jobs efficiently. The scheduler should be able to get access to > >> heuristics > >>>> on previous executions and current requirements which includes job > size > >>>> (number of nodes/cores), memory requirements, wall time estimates and > so > >>>> forth. > >>>>>>>> > >>>>>>>> * As Airavata is mapping multiple individual user jobs into one or > >>>> more community account submissions, it also becomes critical to > >> implement > >>>> fair-share scheduling among these users to ensure fair use of > >> allocations > >>>> as well as allowable queue limits. > >>>>>>>> > >>>>>>>> Other use cases? > >>>>>>>> > >>>>>>>> We will greatly appreciate if folks on this list can shed light on > >>>> experiences using schedulers implemented in hadoop, mesos, storm or > >> other > >>>> frameworks outside of their intended use. For instance, hadoop (yarn) > >>>> capacity [2] and fair schedulers [3][4][5] seem to meet the needs of > >>>> airavata. Is it a good idea to attempt to reuse these implementations? > >> Any > >>>> other pluggable third-party alternatives. > >>>>>>>> > >>>>>>>> Thanks in advance for your time and insights, > >>>>>>>> > >>>>>>>> Suresh > >>>>>>>> > >>>>>>>> [1] - > >>>> > >> > https://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide#running > >>>>>>>> [2] - > >>>> > >> > http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html > >>>>>>>> [3] - > >>>> > >> > http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html > >>>>>>>> [4] - https://issues.apache.org/jira/browse/HADOOP-3746 > >>>>>>>> [5] - https://issues.apache.org/jira/browse/YARN-326 > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >> > >> > >
