Guys,
First, all these concerns are the point of DuraCloud. Cloud-side "storage"
and "services". There was a s3 assetstore prototype that Richard Rodgers
prototyped, but as you point out, there are latency issues to address with
cloud storage.
We (@mire, Tim Donohue, Richard Rodgers) are all trying to coordinate on a
body of work that we hope will be accepted and presented at OR11 that
entails the next iteration of support and tools for getting content into
Duracloud from within the DSpace application itself. However, ATM this
still requires/recommends a local assetstore cache of that content for
performance reasons.
Finally the DSpace 2.0 dspace-storage service is meant to eventually replace
the assetstore in DSpace with one or more implementations configured within
the ServiceManager. What we need are folks buying into the dspace-services
and dspace-storage solutions if they are going to explore creating "new
implementations" of the assetstore. Theres still a great amount of work
needing to get done in this area to "integrate" the storage work into DSpace
1.x. But suffice it to say, we need to both gut BitstreamStorageManager or
Wrap it in a "Service" object and swap out our current application code to
use that instead, then we can implement new variations and configure them
via Spring.
Further documentation and prototypes will be coming in the next month or so.
help is welcome in this area.
Mark
On Fri, Mar 18, 2011 at 12:15 PM, Peter Dietz <[email protected]> wrote:
> Hi Joseph,
>
> Going with S3 would actually be a great way to break the "we can't put
> it in the repository because we'll run out of disk space" barrier, and
> for cheap. Many repo admins will also be likely to consult the
> "trusted repository" handbook, as well as your legal rights to move
> the files uploaded to other storage silos. As a result, our university
> has a massive data center (and massive data center costs).
>
> The s3fs option sounds the most likely to accomplish quickly,
> refactoring DSpace to have a pluggable asset-storage system, and then
> implementing it for s3 would take some effort, however, hopefully
> someone more knowledgeable can chime in. (there may be some prior art)
>
> The downsides of having to make a network connection for disk access
> is when you do a index-init, or filtermedia, and have to do network
> request to do what typically are fast disk accesses. It should work
> fine, but those tasks will be much slower. This point is likely less
> of a problem if you're going with Amazon EC2.
>
> All that said, having s3 would be useful for managing multiple
> development environments, where rsyncing productions assetstore to an
> external drive connected to each computer becomes a chore. Not sure if
> rsync to s3 is much better though.
>
> Also, the demo.dspace.org site resides wholly in Amazon EC2 with
> likely an EBS filesystem. So theres nothing wrong with Amazon, just
> whatever solution you use, the distance between your virtual CPU and
> virtual disk should be as close as possible. Or, perhaps as close as
> possible to the end user.
>
> @Hardy, I don't think the 64GB max per file is going to slow me down
> any. Our entire repo is about that size, and thats thousands of files.
>
> On 3/18/11, Pottinger, Hardy J. <[email protected]> wrote:
> > Hi, I'm certainly not an expert in this area, but from my quick read,
> > depending on the use case for your repository, this looks like something
> > that might work. One thing to be aware of is the 64GB max file size
> imposed
> > by s3fs, and the potential for S3's "Eventual Consistency" model to cause
> > "problems" with user submissions. More details on the s3fs wiki:
> > http://code.google.com/p/s3fs/wiki/EventualConsistency
> >
> > I'm interesting in hearing more about this, if anyone has actually played
> > around with putting an assetstore on s3fs.
> >
> > --Hardy
> >
> >> -----Original Message-----
> >> From: Joseph Rhoads [mailto:[email protected]]
> >> Sent: Friday, March 18, 2011 1:12 PM
> >> To: [email protected]
> >> Subject: [Dspace-devel] Using Amazon S3 for an Assetstore
> >>
> >> I've seen some talk about integrating Amazon S3 as an assetstore (or
> >> bitstream store as it's sometimes called).
> >>
> >>
> >>
> >> Has anyone tried using something like s3fs, a "FUSE-based file system on
> >> Amazon" ?
> >>
> >> (I know there are several flavors of the same idea around but
> >> http://code.google.com/p/s3fs/ seems like a fairly mature one. Another
> >> is http://code.google.com/p/s3ql/ )
> >>
> >>
> >>
> >> And just using a directory the mounted fs as your directory for the
> >> assetstore.
> >>
> >> Are there subtlties that I haven't noticed (after a 10 minute first
> >> glance) that would make it apparent that this is a bad idea?
> >>
> >> Has anyone done this successfully?
> >>
> >>
> >>
> >> -Joseph
> >
> >
> >
> ------------------------------------------------------------------------------
> > Colocation vs. Managed Hosting
> > A question and answer guide to determining the best fit
> > for your organization - today and in the future.
> > http://p.sf.net/sfu/internap-sfd2d
> > _______________________________________________
> > Dspace-devel mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/dspace-devel
> >
>
>
> --
> Peter Dietz
>
>
> ------------------------------------------------------------------------------
> Colocation vs. Managed Hosting
> A question and answer guide to determining the best fit
> for your organization - today and in the future.
> http://p.sf.net/sfu/internap-sfd2d
> _______________________________________________
> Dspace-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-devel
>
--
Mark R. Diggory
@mire - www.atmire.com
2888 Loker Avenue East - Suite 305 - Carlsbad - CA - 92010
Technologielaan 9 - 3001 Heverlee - Belgium
------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel