Hi Josh,

Now that you bring up DSpace as being part of the equation...

You might want to look at the newly released "Replication Task Suite" plugin/addon for DSpace (supports DSpace versions 1.8.x & 3.0):

https://wiki.duraspace.org/display/DSPACE/ReplicationTaskSuite

This DSpace plugin does essentially what you are talking about...

It allows you to backup (i.e. replicate) DSpace content files and metadata (in the form of a set of AIPs, Archival Information Packages) to a local filesystem/drive or to cloud storage. Plus it provides an "auditing" tool to audit changes between DSpace and the cloud storage provider. Currently, for the Replication Task Suite, that only cloud storage plugin we have created is for DuraCloud. But, it wouldn't be too hard to create a new plugin for Glacier (if you wanted to send DSpace content directly to Glacier without DuraCloud in between).

The code is in GitHub at:
https://github.com/DSpace/dspace-replicate

If you decide to use it and create anything cool, feel free to send us a pull request.

Good luck,

- Tim

--
Tim Donohue
Technical Lead for DSpace Project
DuraSpace.org

On 1/11/2013 1:45 PM, Joshua Welker wrote:
Thanks for bringing up the issue of the cost of making sure the data is 
consistent. We will be using DSpace for now, and I know DSpace has some 
checksum functionality built in out-of-the-box. It shouldn't be too difficult 
to write a script that loops through DSpace's checksum data and compares it 
against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it 
looks like they provide an archive inventory (updated daily) that can be 
downloaded as JSON. I read some users saying that this inventory includes 
checksum data. So hopefully it will just be a matter of comparing the local 
checksum to the Glacier checksum, and that would be easy enough to script.

Josh Welker


-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf Of Ryan Eby
Sent: Friday, January 11, 2013 11:37 AM
To: [email protected]
Subject: Re: [CODE4LIB] Digital collection backups

As Aaron alludes to your decision should base off your real needs and they 
might not be exclusive.

LOCKSS/MetaArchive might be worth the money if it is the community archival 
aspect you are going for. Depending on your institution being a participant 
might make political/mission sense regardless of the storage needs and it could 
just be a specific collection that makes sense.

Glacier is a great choice if you are looking for spreading a backup across 
regions. S3 similarly if you also want to benefit from CloudFront (the CDN
setup) to take load off your institutions server (you can now use cloudfront 
off your own origin server as well). Depending on your bandwidth this might be 
worth the money regardless of LOCKSS participation (which can be more dark). 
Amazon also tends to be dropping prices over time vs raising but as any 
outsource you have to plan that it might not exist in the future. Also look 
more at Glacier prices in terms of checking your data for consistency. There 
have been a few papers on the costs of making sure Amazon really has the proper 
data depending on how often your requirements want you to check.

Another option if you are just looking for more geo placement is finding an 
institution or service provider that will colocate. There may be another small 
institution that would love to shove a cheap box with hard drives on your 
network in exchange for the same. Not as involved/formal as LOCKSS but gives 
you something you control to satisfy your requirements. It could also be as low 
tech as shipping SSDs to another institution who then runs some bagit checksums 
on the drive, etc.

All of the above should be scriptable in your workflow. Just need to decide 
what you really want out of it.

Eby


On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub <[email protected]> wrote:

Hello Josh,

Auburn University is a member of two Private LOCKSS Networks: the
MetaArchive Cooperative and the Alabama Digital Preservation Network
(ADPNet).  Here's a link to a recent conference paper that describes
both networks, including their current pricing structures:

http://conference.ifla.org/past/ifla78/216-trehub-en.pdf

LOCKSS has worked well for us so far, in part because supporting
community-based solutions is important to us.  As you point out,
however, Glacier is an attractive alternative, especially for
institutions that may be more interested in low-cost, low-throughput
storage and less concerned about entrusting their content to a
commercial outfit or having to pay extra to get it back out.  As with
most things, you pay your money--more or less, depending--and make your choice. 
 And take your risks.

Good luck with whatever solution(s) you decide on.  They need not be
mutually exclusive.

Best,

Aaron

Aaron Trehub
Assistant Dean for Technology and Technical Services Auburn University
Libraries
231 Mell Street, RBD Library
Auburn, AL 36849-5606
Phone: (334) 844-1716
Skype: ajtrehub
E-mail: [email protected]
URL: http://lib.auburn.edu/


Reply via email to