Re: [galaxy-dev] uploading multi-file archives and creation of potentially large collections

Björn Grüning Fri, 05 Aug 2016 00:45:27 -0700

Hi Stephan and Gildas,

I think a tool is a nice addition. Archives can be arbitrary complex and
nested so a tool would make perfect sense.
If this tools is working well and we know which features we need and how
they should be displayed to the user it more easy to move these kind of
functionality into the upload-box, without making the upload to
complicated and scare users :)


Just my 2cents,
Bjoern


Am 05.08.2016 um 09:18 schrieb Gildas Le Corguillé:
> Hi Stephan,
> 
> I will only answer about uploading zip files. Since the release_16.04, a
> zip datatype is integrated within the Galaxy distribution. But without
> any sniffer, so your users will have to select, before the upload, the
> zip datatype manually.
> 
> I also want to write this kind of tool which will be able to extract a
> zip file and produce dataset collections. I would also like to add the
> possibility to create one dataset collection per folder
> (condition/phenotype). I start something like that last week but ...
> Thus, in short term, I want to propose this tool to help my user to
> switch to dataset collections but I want this transition as smooth as
> possible (users are sometime stubborn)
> 
> Thanks to ask, I will follow this thread closely.
> 
> 
> Gildas
> 
> -----------------------------------------------------------------
> Gildas Le Corguillé - Bioinformatician/Bioanalyste
> 
> Plateform ABiMS (Analyses and Bioinformatics for Marine Science)
> http://abims.sb-roscoff.fr
> 
> Member of the Workflow4Metabolomics project
> http://workflow4metabolomics.org <http://workflow4metabolomics.org/>
> 
> Station Biologique de Roscoff - UPMC/CNRS - FR2424
> Place Georges Teissier 29680 Roscoff FRANCE
> tel: +33 2 98 29 23 81
> ------------------------------------------------------------------
> 
> 
> 
>> Le 4 août 2016 à 17:44, Stephan Oepen <[email protected]
>> <mailto:[email protected]>> a écrit :
>>
>> colleagues,
>>
>> in our adaptation of galaxy for large-scale natural language
>> processing, a fairly common use pattern is to invoke a workflow on a
>> potentially large number of text files.  hence, i am wondering about
>> facilities for uploading an archive (in ‘.zip’ or ‘.tgz’ format, say)
>> containing several files, where i would like the upload tool to
>> extract the files from the archive, import each individually into my
>> history, and (maybe optionally) create a list collection for the set
>> of files.
>>
>> in my current galaxy instance (running version 2015.03), when i upload
>> a multi-file ‘.zip’ file, part of the above actually happens: however,
>> the upload tool only imports the first file extracted from the archive
>> (and helpfully shows a warning message on the corresponding history
>> entry).  have there been relevant changes in this neighborhood in more
>> recent galaxy releases?
>>
>> related to the above, we have started to experiment with potentially
>> large collections and are beginning to worry about the scalability of
>> the collection mechanism.  in principle, we would like to operate on
>> collections comprised of tens or hundreds of thousands of individual
>> datasets.  what are common collection sizes (in the number of
>> components, not so much in the aggregate file size) used in other
>> galaxy instances to date?  what kind of gut reaction do galaxy
>> developers have to the idea of a collection containing, say, a hundred
>> thousand entries?
>>
>> with thanks in advance,
>> ___________________________________________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>  https://lists.galaxyproject.org/
>>
>> To search Galaxy mailing lists use the unified search at:
>>  http://galaxyproject.org/search/mailinglists/
> 
> 
> 
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
> 
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
> 
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] uploading multi-file archives and creation of potentially large collections

Reply via email to