On Mon, Feb 22, 2016 at 7:57 AM, Peter van Heusden <p...@sanbi.ac.za> wrote:
> Hi there
>
> ...
>
> 4) Currently parallelisation in Galaxy is supported using two mechanisms:
> collections and dataset splitters/tasks. Are there plans on extending and
> harmonising Galaxy's parallelisation capabilities?

I'm not sure there is anything formal, but chatting to John and others
at GCC2015 we recognised that the split/merge capabilities in the
Python datatype classes have a lot of functional overlap between
splitting and merging for datasets into collections.

https://wiki.galaxyproject.org/Events/GCC2015/BoFs/DataSplittingAndParallelism

One idea we mooted was defining (pseudo) tools for dataset splitting
and merging using the existing datatype classes, with similar integration
into the framework as the datatype converter tools.

i.e. You could in principle merge a collection of text files using the
text datatype's merge functionality (which is essentially a cat
command).

There are a lot of details to think about, particularly for splitting
where currently tool wrappers using parallelisation have some
control (e.g. split a large FASTA file into chunks of 1000 sequences),
which might need to be exposed in any UI for creating a collection
from a single file.

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to