Peter -

My plans for pre-GCC workflow work are sort of outlined in this issue:
https://github.com/galaxyproject/planemo/issues/408 (I want an
abstract for GCC and BOSC like "Planemo – A Scientific Workflow SDK").

I've been doing most of my work out of this branch
https://github.com/galaxyproject/galaxy/compare/dev...common-workflow-language:cwl.
It has my work in progress on CWL support, collection operations
(rejected once from Galaxy here
https://github.com/galaxyproject/galaxy/pull/1313) but these are so
important I'm going to take another stab at pushing them into Galaxy,
and work on expression tools to produce values that will hopefully tie
back into workflows as connections for non-data parameters - both as
Galaxy native enties and CWL based enties.

There have been some completely valid complaints about the background
workflow scheduling being slow and buggy, these will need to be fixed
by 16.04 since all workflows will be executed this way as of then. I
hope also to take another pass at subworkflows - better tracking of
sources, allowing upgrading subworkflow steps, fixing glaring bugs
like https://github.com/galaxyproject/galaxy/issues/1739.

Peter C. mentioned splitting and joining files into/from collections
in workflows based on the datatype methods (so hooking into
parallelism) - I have some initial WIP on this here
https://github.com/jmchilton/galaxy/commit/c4d93acdb3b0f89b970b7c3d17b965be8ab3ba30
as part of this branch
https://github.com/jmchilton/galaxy/tree/split_merge_collections. I
spent a couple hours on it - I think if I spent a day or two on it I'd
have a usable prototype to hack on - I don't remember thinking there
were any big hurdles I was encountering in doing that. (So the answer
to your last question is a definitive yes.)

Sam started a bunch of work here with completely replacing the
workflow form with an API driven one here
https://github.com/galaxyproject/galaxy/pull/1249. I know he hopes to
have that done in 16.04 - it will allow us to delete a bunch of paths
through the workflow code and should allow future developments to be
made more rapidly. It will ensure everything is coming through the API
also - which means Galaxy's test coverage of workflow stuff will be
much higher (given our depth of workflow API tests).

I'm happy to have a hangout to discuss this more, I consider the
planemo issue something of a roadmap for what I want to work on in the
first half of 2016 - but I might get pulled away or told the project
has other priorities.

As for scheduling workflows instead of jobs - this is intriguing and
really would probably be needed to get streaming working well in
Galaxy. So I would say - I want to work on it someday - but I probably
won't get to it in 2016. If others want to hack on it, that is
fantastic but it is also a difficult feat.  (At least scheduling out
and optimizing pieces of the workflow, Kyle Ellrott, Dannon, and I had
some interesting ideas about scheduling whole workflows on local
Galaxy instances running on a cluster and just collecting the outputs
- that would be significantly more doable given I sort of sculpted the
changes made to backgrounding workflows to preserve things for doing
that - though the work left is probably still a hard task).

Hope this helps.

-John

On Mon, Feb 22, 2016 at 7:57 AM, Peter van Heusden <p...@sanbi.ac.za> wrote:
> Hi there
>
> I see from the PR landing in Galaxy and the comments on things like issue
> #1701 (https://github.com/galaxyproject/galaxy/issues/1701) that there's
> lots of work happening on the workflow side of Galaxy. This is an area of
> interest at SANBI too, so we'd like to coordinate development efforts as
> much as possible. To this end:
>
> 1) Are there forks to track so we can see what new code is landing?
> 2) Is there a roadmap for workflow work or perhaps can we have a Hangout to
> talk about this?
> 3) Specifically in terms of workflows and parallelisation: are there any
> plans to work on running workflows as opposed to just generating lots of
> jobs? I know this is a major change to how Galaxy works - it would mean
> something like submitting a workflow specification to a job runner that is
> located on the cluster, and then returning the results of workflow
> execution.
> 4) Currently parallelisation in Galaxy is supported using two mechanisms:
> collections and dataset splitters/tasks. Are there plans on extending and
> harmonising Galaxy's parallelisation capabilities?
>
> Thanks,
> Peter
>
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to