Nick Schurch wrote:
> Hi all,
> 
> I've recently encountered a few problems when trying to use Galaxy which are
> really driving me away from using it as a bioinformatics platform for NGS. I
> was wonderinf if there are any simple solutions that I've missed...

Hi Nick,

We've had some internal discussion and proposed some solutions which
would hopefully make Galaxy more useful for your environment.

> Firstly, It seems that while there are a few solutions for getting large
> files (a few GB) into a local install of galaxy without going through HTTP,
> many tools that operate on these files produces multiple, uncompressed large
> files which quickly eat up the disk allocation. This is particularly
> significant in a workflow that has multiple processing steps which each
> leave behind a large file. With no way to compress or archive files produced
> by intermediate steps in a workflow, and no desire to delete them since I
> may need to go back to them and they can take hours to re-run, the only two
> remaining options seem to be to save them and then delete them.

We've dealt with this locally by implementing compression in the
underlying filesystem (ZFS), but this requires a fileserver that runs
Solaris (or a derivative) or FreeBSD.  Btrfs also supports compression
but I would be a bit more wary of losing my data with btrfs since it is
less mature and can't recover corrupted filesystems.  Fusecompress would
also be an option.

We would strongly recommend performing regular backups regardless of any
filesystem-level choice.

Unfortunately this is a tricky problem to solve within Galaxy itself.
While some tools can operate on compressed files directly, many cannot
and so compressing all outputs could prove to be very CPU intensive and
a waste of time if the next step will have to decompress the file.
There has been some discussion of how to implement transparent
compression and other complex underlying data management directly in
Galaxy, but any work on it is not likely to commence soon.

> And this brings me to the second problem. Getting large files out of Galaxy.
> The only way to save large files from Galaxy (that I can see) is the save
> icon, which downloads the file via http. This take *ages* for a large file
> and also causes big headaches for my firefox browser. I've taken a quick
> peek at the Galaxy file system to see if I could just copy a file, but its
> almost completely indecipherable if you want to find out what file in the
> file system corresponds to a file saved from a tool. Is there some way to
> get the location of a particular file on the galaxy file system, that I can
> just copy?

This is certainly something we can implement and will be working on
fairly soon.  There have been quite a few requests to integrate more
tightly with environments where Galaxy users exist as system users.

There's an issue in our tracker which you can follow here:

  https://bitbucket.org/galaxy/galaxy-central/issue/106/

--nate

> 
> -- 
> Cheers,
> 
> Nick Schurch
> 
> Data Analysis Group (The Barton Group),
> School of Life Sciences,
> University of Dundee,
> Dow St,
> Dundee,
> DD1 5EH,
> Scotland,
> UK
> 
> Tel: +44 1382 388707
> Fax: +44 1382 345 893
> 
> 
> 
> -- 
> Cheers,
> 
> Nick Schurch
> 
> Data Analysis Group (The Barton Group),
> School of Life Sciences,
> University of Dundee,
> Dow St,
> Dundee,
> DD1 5EH,
> Scotland,
> UK
> 
> Tel: +44 1382 388707
> Fax: +44 1382 345 893

> _______________________________________________
> galaxy-user mailing list
> galaxy-u...@lists.bx.psu.edu
> http://lists.bx.psu.edu/listinfo/galaxy-user

_______________________________________________
To manage your subscriptions to this and other Galaxy lists, please use the 
interface at:

  http://lists.bx.psu.edu/

Reply via email to