Hi Nick,

Yes, these nextgen reads files are huge and getting bigger every quarter!
 But there will be storage issues nomatter whether you use Galaxy or not.
 In fact, i think users are more likely to cleanup files and histories in
galaxy than they are to cleanup NFS folders -- out of sight, out of mind!

Firstly, I think unnecessary intermediate files are more of a problem than
whether or not the file is compressed or not.  Indeed, just transferring
these files back and forth from the cluster takes a while, not to mention
the delay in waiting to be rescheduled for each step.  And so I created a
tool which would do the job of fastq groomer, end-trimmer, process pairs,
and a few other simple tasks -- all in one shot.  I haven't uploaded it to
the toolshed yet but I will.  I hate to duplicate existing tools, but i have
a lot of seq data.  I will also create a fastqilluminabz2 datatype as well
and include it with the tool.

For getting files into galaxy, I created a simple tool which would allow
staff to enter NFS paths and the option to either copy or symlink if the
location was considered stable.  I allowed only certain folders (e.g. /home,
/storage) and added a password, for security.  Similarly, for getting a file
out, all you need is a dinky tool for users to provide a destination path.
since i've got galaxy running as a special galaxy user in a special galaxy
group, file access is restricted (as it should be), so i tell users to
create a dropbox folder in their homedir (and chmod 777).  by creating a
tool like this, you don't need to care how galaxy names the files.  i
deliberately try to not mess around under the hood.  i can upload these to
galaxy toolshed, but like i said, there isn't much to them.

Ed

On Wed, Feb 9, 2011 at 4:17 AM, Nick Schurch <n.schu...@dundee.ac.uk> wrote:

>
> Hi all,
>
> I've recently encountered a few problems when trying to use Galaxy which
> are really driving me away from using it as a bioinformatics platform for
> NGS. I was wonderinf if there are any simple solutions that I've missed...
>
> Firstly, It seems that while there are a few solutions for getting large
> files (a few GB) into a local install of galaxy without going through HTTP,
> many tools that operate on these files produces multiple, uncompressed large
> files which quickly eat up the disk allocation. This is particularly
> significant in a workflow that has multiple processing steps which each
> leave behind a large file. With no way to compress or archive files produced
> by intermediate steps in a workflow, and no desire to delete them since I
> may need to go back to them and they can take hours to re-run, the only two
> remaining options seem to be to save them and then delete them.
>
> And this brings me to the second problem. Getting large files out of
> Galaxy. The only way to save large files from Galaxy (that I can see) is the
> save icon, which downloads the file via http. This take *ages* for a large
> file and also causes big headaches for my firefox browser. I've taken a
> quick peek at the Galaxy file system to see if I could just copy a file, but
> its almost completely indecipherable if you want to find out what file in
> the file system corresponds to a file saved from a tool. Is there some way
> to get the location of a particular file on the galaxy file system, that I
> can just copy?
>
> --
> Cheers,
>
> Nick Schurch
>
> Data Analysis Group (The Barton Group),
> School of Life Sciences,
> University of Dundee,
> Dow St,
> Dundee,
> DD1 5EH,
> Scotland,
> UK
>
> Tel: +44 1382 388707
> Fax: +44 1382 345 893
>
>
>
> --
> Cheers,
>
> Nick Schurch
>
> Data Analysis Group (The Barton Group),
> School of Life Sciences,
> University of Dundee,
> Dow St,
> Dundee,
> DD1 5EH,
> Scotland,
> UK
>
> Tel: +44 1382 388707
> Fax: +44 1382 345 893
>
> _______________________________________________
> galaxy-user mailing list
> galaxy-u...@lists.bx.psu.edu
> http://lists.bx.psu.edu/listinfo/galaxy-user
>
>
_______________________________________________
To manage your subscriptions to this and other Galaxy lists, please use the 
interface at:

  http://lists.bx.psu.edu/

Reply via email to