Thank you for this detailed descriptions!

I already have a followup question.
I'm working on Galaxy Cloudman:

> Galaxy is at revision: 93cda3eb81 (master branch) from 11 Jun 2015)


But I just can find "Build dataset pair|list", not "List of Dataset Pairs"
like
in the video. At what version is that implemented?

Best,
Alexander

2015-06-15 10:17 GMT-05:00 John Chilton <jmchil...@gmail.com>:

> On Wed, Jun 10, 2015 at 4:04 PM, Alexander Vowinkel
> <vowinkel.alexan...@gmail.com> wrote:
> > Hi Folks,
> >
> > thank you so far for the previous help. I got much further.
> > Now I'm stuck with data collections.
> >
> > Because this is quite a list, I appreciate also answers to parts of my
> > questions ;)
> >
> > I have two issues:
> > A) manual definition of data collections (any type) by user and/or admin
> > B) definition of data collections as input/output of a tool and inside a
> > workflow
> >
> >
> > A) manual
> > Basically I would like to create
> > i) a list of fastq files (unpaired)
> > ii) a paired set of two fastq files
> > iii) a list of each two paired fastq files
> >
> > How can I do that?
> > By using the web app? As user? As admin?
> > By working via ssh on the server?
>
> So each of these got much easier/more robust with the most recent release.
>
> For the user perspective - for any of these options you will want to
> load the fastq files into a history, open the manage multiple datasets
> option (
> https://wiki.galaxyproject.org/Histories#Managing_Multiple_Datasets_Easily
> ),
> select the datasets, and then choose the list type from the menu. Each
> will cause a widget to pop up allowing you to group the datasets (into
> a list, a pair, or a list of pairs  depending on your selection).
>
> The most complicated option is the list of pairs - this option is
> demonstrated in a the first video in Anton's recent NGS 101 -
> Reference-based RNA-seq series
> (https://vimeo.com/channels/884356/128265983). More information at
> https://wiki.galaxyproject.org/Learn/GalaxyNGS101.
>
> For all user-centric scenarios - you will need to get the plain
> datasets into a history first. FTP upload for instance doesn't support
> creating collections directly - you can import datasets and then
> create them. Likewise - data libraries do not currently support
> dataset collections. I believe there are Trello cards for both of
> these issues.
>
> For admins - there is a dataset collection API - I can point you at
> examples if you want - but this doesn't seem to be your interest.
>
> >
> >
> > B) in tool/workflow
> > Here I also have different approaches I would like to realize:
> > i) use a collection as input for a tool
> > ii) create a collection as output of a tool
> > ii.1) from known # of output parameters
> > ii.2) from unknown # of output parameters
> >
> > For these things I was trying to find some tools in toolshed to see how
> they
> > do it, but I couldn't quite adopt it.
>
> I would look in the following directory instead of the tool shed -
> https://github.com/galaxyproject/galaxy/tree/dev/test/functional/tools.
> These are the tools used to drive the testing of the collections
> implementation and contain some very stripped down examples of what is
> possible.
>
> >
> > i) use a collection as input for a tool
> > this is good documented - realizable by type="data_collection" and the
> > collection_type.
> > Unfortunately I can't test this because I can't create a collection so
> far
> > ;) - see A
>
> Indeed :). Here some good examples are like the tools in the RNA-seq
> pipeline - Tophat, Bowtie2, etc....
>
> >
> > ii) create a collection as output of a tool
> > Here it gets blurry for me.
>
> So one can get very far without ever creating an output from a tool
> explicitly. I contend most of the time - if you have a list of bam
> files and you want to create another list of bam files - you just want
> to map some operation over them. This is demonstrated in that RNA-seq
> outline - and talked about in a more theoretical way in my GCC talk
> from last year http://bit.ly/gcc2014workflows.
>
> There are definitely cases when you want to explicitly create
> collections though - the current best documentation on this is going
> to be the pull request that added them - not the implementation but
> the description which actually lays out these same categories and how
> to handle them with explicit complete examples.
>
> https://bitbucket.org/galaxy/galaxy-central/pull-request/634/allow-tools-to-explicitly-produce-dataset
>
> Hopefully this helps - please follow up with additional questions as
> you have them. I am keen to see more developers leveraging dataset
> collections.
>
> Thanks a bunch.
> -John
>
> >
> > ii.1) from known # of output parameters
> > Here I didn't find a tool. I just thought, it might be a simpler case
> than
> > ii.2 and
> > good to understand the concept.
> > I would be glad if someone could explain the way(s) to do this.
> >
> > ii.2) from unknown # of output parameters
> > For this I found barcode splitter tools (also from devteam) that have
> > different approaches.
> > But. Their output (defined in xml) is only some report file.
> > The output files seem to be fed into the history.
> > And here I don't know how to get hands on these files when I want to use
> > them to feed them into the next step during a workflow.
> >
> > Help highly appreciated!
> >
> > Thanks!
> > Alexander
> >
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to