Swift indeed is a complete framework for distributed computing.
Distributing files out to cluster nodes, starting processes, bringing back
result files to submit host is done out of the box (stagein-exec-stageout
We can discuss offline if you are interested in giving it a shot.
On Mon, Oct 28, 2013 at 4:14 PM, Kyle Ellrott <kellr...@soe.ucsc.edu> wrote:
> You probably are a good person to get an opinion from. My plan isn't to
> write new frameworks, but rather use existing libraries that can
> communicate with Mesos to setup their parallel environments.
> But for Swift, you would probably want to write a new framework. Just
> looking at Swift, I imagine one of the harder parts is just getting the
> system setup on a cluster (ie distributing out files to remote nodes,
> making sure that you have a way to start processes on those nodes and have
> them know where to find the master), it seems like Swift could benefit from
> having a Mesos based framework. Do you think it would enable you to have a
> 'zero-config' startup of a distributed Swift application?
> On Mon, Oct 28, 2013 at 1:51 PM, Ketan Maheshwari <
> ketancmaheshw...@gmail.com> wrote:
>> Hi Kyle,
>> We have a similar ongoing development wherein we are working on
>> integrating our Swift framework ( swift-lang.org ) with Galaxy. The goal
>> is to enable Galaxy based applications to run on a variety of distributed
>> resources via various integration schemes as suitable to application and
>> underlying execution environment.
>> Here is an abstract of a paper (co-authored with Ravi, who responded on
>> this thread) we will be presenting in a workshop at the upcoming SC 13
>> "The Galaxy platform is a web-based science portal for scientific
>> computing supporting Life Sciences users community. While user-friendly and
>> intuitive for doing small to medium scale computations, it currently has a
>> limited support for large-scale, parallel and distributed computing. The
>> Swift parallel scripting framework is capable of composing ordinary
>> applications into parallel scripts that can be run on multi-scale
>> distributed and performance computing platforms. In complex distributed
>> environments, often the user end of application lifecycle slows down
>> because of the technical complexities brought in by the scale, access
>> methods and resource management nuances. Galaxy offers a simple way of
>> designing, composing, executing, reusing, and reproducing application runs.
>> An integration between Swift and Galaxy systems can accelerate science as
>> well as bring the respective user communities together in an interactive,
>> user-friendly, parallel and distributed data analysis environment enabled
>> on a broad range of computational infrastructures."
>> Kindly let us know if you need a hands on for the various tools we have
>> already developed.
>> On Mon, Oct 28, 2013 at 3:07 PM, Kyle Ellrott <kellr...@soe.ucsc.edu>wrote:
>>> I don't think implementation will be very difficult. The bigger question
>>> is this a technology people are open to?
>>> The nearest competitor is YARN (
>>> Mesos seems a bit more geared toward general purpose usage (with several
>>> existing frameworks), while YARN seems more specific to Hadoop. But I'd be
>>> glad to hear some other thoughts.
>>> On Mon, Oct 28, 2013 at 12:55 PM, Ravi K Madduri <madd...@mcs.anl.gov>wrote:
>>>> This is something I am very interested in. The three parts below make
>>>> sense to me. I would be very happy to discuss further and provide any help
>>>> to move this forward.
>>>> On Oct 26, 2013, at 2:43 PM, Kyle Ellrott <kellr...@soe.ucsc.edu>
>>>> I think one of the aspects where Galaxy is a bit soft is the ability to
>>>> do distributed tasks. The current system of split/replicate/merge tasks
>>>> based on file type is a bit limited and hard for tool developers to expand
>>>> upon. Distributed computing is a non-trival thing to implement and I think
>>>> it would be a better use of our time to use an already existing framework.
>>>> And it would also mean one less API for tool writers to have to develop
>>>> I was wondering if anybody has looked at Mesos (
>>>> http://mesos.apache.org/ ). You can see an overview of the Mesos
>>>> architecture at
>>>> The important thing about Mesos is that it provides an API for C/C++,
>>>> Java/Scala and Python to write distributed frameworks. There are already
>>>> implementations of frameworks for common parallel programming systems such
>>>> - Hadoop (https://github.com/mesos/hadoop)
>>>> - MPI (
>>>> - Spark (http://spark-project.org)
>>>> And you can find example Python framework at
>>>> Integration with Galaxy would have three parts:
>>>> 1) Add a system config variable to Galaxy called 'MESOS_URL' that is
>>>> then passed to tool wrappers and allows them to contact the local mesos
>>>> infrastructure (assuming the system has been configured) or pass a null if
>>>> the system isn't available.
>>>> 2) Write a tool runner that works as a mesos framework to executes
>>>> single cpu jobs on the distributed system.
>>>> 3) For instances where mesos is not available at a system wide level
>>>> (say they only have access to an SGE based cluster), but the user wants to
>>>> run distributed jobs, write a wrapper that can create a mesos cluster using
>>>> the existing queueing system. For example, right now I run a Mesos system
>>>> under the SGE queue system.
>>>> I'm curious to see what other people think.
>>>> Please keep all replies on the list by using "reply all"
>>>> in your mail client. To manage your subscriptions to this
>>>> and other Galaxy lists, please use the interface at:
>>>> To search Galaxy mailing lists use the unified search at:
>>>> Ravi K Madduri
>>>> MCS, Argonne National Laboratory
>>>> Computation Institute, University of Chicago
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client. To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>> To search Galaxy mailing lists use the unified search at:
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at: