Unfortunately the snapshotting process used by the is really slow, as
you've noticed, and there's not much we can do to make that faster.  We've
discussed other methods for growing the filesystem but haven't finished it
yet.

For working with S3, instead of s3fs you're more than welcome to give
Galaxy's S3ObjectStore a shot.  There isn't much documentation available
for it right now, and I'd still say it's a beta feature in need of more
testing and optimization, but to enable it define the following options in
your universe_wsgi.ini:

# Object store mode (valid options are: disk, s3, distributed, hierarchical)
#object_store = s3
#aws_access_key = <your access key>
#aws_secret_key = <your secret key>
#s3_bucket = <a bucket name for all your files>
#use_reduced_redundancy = True

# Size (in GB) that the cache used by object store should be limited to.
# If the value is not specified, the cache size will be limited only by the
file
# system size.
object_store_cache_size = <decide based on the size of the EBS volume you
want to use as scratch>

What this will do is use the EBS volume for working, exactly like galaxy
does currently.  Additionally, it'll push datasets to S3 and delete them
from local disk as necessary (last touched deleted first) to stay beneath
object_store_cache_size.  If something swaps out, it's just fetched back
from S3 as needed, but most of the time you'll be working on disk directly
and pushing to S3 in the background.

I'd be more than happy to work with you if you run into any issues trying
this out, this is something we've wanted to firm up for a while now and
having a real live test case would be useful!

-Dannon


On Tue, Feb 26, 2013 at 7:24 PM, Scooter Willis <hwil...@scripps.edu> wrote:

> Used cloud man to create a new cluster on Feb 22 and picked 500GB as the
> initial size of the data drive. Working with TCGA exome DNA seq data didn't
> take long to fill that up. Used the cloud man admin interface to resize
> from 500GB to 1TB and the resize operation took 15 hours. Not sure if that
> is expected so wanted to give some heads up in case that is an area for
> optimization.
>
> Since I now have a local storage problem as I need to work with more than
> 1TB of data I tried to go the route of setting a S3 bucket using Fuse. Ran
> into a problem where the first s3fs software I tried to install had a
> version issue with Ubuntu 10.
>
> I remember something in a support email that better support for Amazon S3
> was in the works. Can you provide any guidance or thoughts on how to work
> with more than 1TB of data using cost effective S3 versus expensive EBS?
> The same applies for storing results at S3.
>
> With s3fs the file system can hide many of the complexities of moving
> files back and forth with caching where working with 30GB+ files isn't
> going to be fun.
>
> Thanks
>
> Scooter
>
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to