Re: [galaxy-dev] Want faster dataset import

Dannon Baker Wed, 25 Sep 2013 08:51:09 -0700

Have you tried to use data libraries for this?  There's an import mechanism
there that'll allow you to simply link to the file on disk without
copy/upload.  I believe the "example_watch_folder.py" sample script (in the
distribution) does just this via the API, if you want an example.



On Mon, Sep 23, 2013 at 5:15 PM, Ted Goldstein <[email protected]> wrote:

> I want to frequently import many tens of thousands of datasets. The files
> are on the same sever as Galaxy. But the upload based  mechanism is really
> really slow. It takes hours to load this many files, yet the data is not
> moving at all!
>
> What is the best strategy to go about making a faster bulk import?  I can
> imagine a tight loop that  is
>
> My datatypes are limited.
>
> foreach folder
>    new LibraryFolder
>
> foreach file in each directory
>     new LibraryDataset
>     new LibraryDatasetDatasetAssociation
>
> flush once at the end.
>
> Thoughts?
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Want faster dataset import

Reply via email to