Glad to see someone else is playing around with Mesos.
I have a mesos branch that is getting a little long in the tooth. I'd like
to get a straight job runner (non-LWR, with a shared file system) running
under mesos for Galaxy before I submit that work for a pull request.

The hackathon is only 12 days away! Hopefully we'll be able to make some
progress on these sorts of projects.

Kyle



On Sun, Jun 15, 2014 at 4:06 PM, John Chilton <jmchil...@gmail.com> wrote:

> Hey Kyle, all,
>
>   If anyone wants to play with running Galaxy jobs within an Apache
> Mesos environment I have added a prototype of this feature to the LWR.
>
>
> https://bitbucket.org/jmchilton/lwr/commits/555438d2fe266899338474b25c540fef42bcece7
>
> https://bitbucket.org/jmchilton/lwr/commits/9748b3035dbe3802d4136a6a1028df8395a9aeb3
>
> This work distributes jobs across a Mesos cluster and injects a
> MESOS_URL environment variable into the job runtime environment in
> case the jobs themselves want to take advantage of Mesos.
>
> The advantage of the LWR versus a traditional Galaxy runner is that
> the job can be staged to remote resources without shared disk. Prior
> to this I was imaging the LWR to be useful in cases where Galaxy and
> remote cluster don't share common disk but where there is in fact a
> shared scratch directory or something across the remote cluster as
> well a resource manager. The LWR Mesos framework however has the
> actual compute servers themselves stage the job up and down - so you
> could imagine distributing Galaxy across large clusters without any
> shared disk whatsoever - that could be very cool and help scale say
> cloud applications.
>
> Downsides of an LWR-based approach versus a Galaxy approach is that it
> is less mature and there is more stuff to configure - need to
> configure a Galaxy job_conf plugin and destination, need to configure
> the LWR itself, need to configure a message queue (for this variant of
> LWR operation anyway - it should be possible to drive this via the LWR
> in web server mode but I haven't added it yet). I would be more than
> happy to continue to see progress toward Mesos support in Galaxy
> proper.
>
> It is strictly a prototype so far - a sort of playground if anyone
> wants to play with these ideas and build something cool. It really is
> a "framework" right - not so much a job scheduler so I am not sure it
> is very immediately useful - but I imagine one could build cool stuff
> on top of it.
>
> Next, I think I would like to add Apache Aurora
> (http://aurora.incubator.apache.org/) support - because it seems like
> a much more traditional resource manager but built on top of Mesos so
> it would be more practical for traditional Galaxy-style jobs. Doesn't
> buy you anything in terms of parallelization but it would "fit better"
> with Galaxy.
>
> -John
>
>
> On Sat, Oct 26, 2013 at 2:43 PM, Kyle Ellrott <kellr...@soe.ucsc.edu>
> wrote:
> > I think one of the aspects where Galaxy is a bit soft is the ability to
> do
> > distributed tasks. The current system of split/replicate/merge tasks
> based
> > on file type is a bit limited and hard for tool developers to expand
> upon.
> > Distributed computing is a non-trival thing to implement and I think it
> > would be a better use of our time to use an already existing framework.
> And
> > it would also mean one less API for tool writers to have to develop for.
> > I was wondering if anybody has looked at Mesos (
> http://mesos.apache.org/ ).
> > You can see an overview of the Mesos architecture at
> > https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
> > The important thing about Mesos is that it provides an API for C/C++,
> > Java/Scala and Python to write distributed frameworks. There are already
> > implementations of frameworks for common parallel programming systems
> such
> > as:
> >  - Hadoop (https://github.com/mesos/hadoop)
> >  - MPI
> > (
> https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md
> )
> >  - Spark (http://spark-project.org)
> > And you can find example Python framework at
> > https://github.com/apache/mesos/tree/master/src/examples/python
> >
> > Integration with Galaxy would have three parts:
> > 1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
> > passed to tool wrappers and allows them to contact the local mesos
> > infrastructure (assuming the system has been configured) or pass a null
> if
> > the system isn't available.
> > 2) Write a tool runner that works as a mesos framework to executes single
> > cpu jobs on the distributed system.
> > 3) For instances where mesos is not available at a system wide level (say
> > they only have access to an SGE based cluster), but the user wants to run
> > distributed jobs, write a wrapper that can create a mesos cluster using
> the
> > existing queueing system. For example, right now I run a Mesos system
> under
> > the SGE queue system.
> >
> > I'm curious to see what other people think.
> >
> > Kyle
> >
> > ___________________________________________________________
> > Please keep all replies on the list by using "reply all"
> > in your mail client.  To manage your subscriptions to this
> > and other Galaxy lists, please use the interface at:
> >   http://lists.bx.psu.edu/
> >
> > To search Galaxy mailing lists use the unified search at:
> >   http://galaxyproject.org/search/mailinglists/
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to