Thanks for bringing up the issue of the cost of making sure the data is 
consistent. We will be using DSpace for now, and I know DSpace has some 
checksum functionality built in out-of-the-box. It shouldn't be too difficult 
to write a script that loops through DSpace's checksum data and compares it 
against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it 
looks like they provide an archive inventory (updated daily) that can be 
downloaded as JSON. I read some users saying that this inventory includes 
checksum data. So hopefully it will just be a matter of comparing the local 
checksum to the Glacier checksum, and that would be easy enough to script.

Josh Welker


-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf Of Ryan Eby
Sent: Friday, January 11, 2013 11:37 AM
To: [email protected]
Subject: Re: [CODE4LIB] Digital collection backups

As Aaron alludes to your decision should base off your real needs and they 
might not be exclusive.

LOCKSS/MetaArchive might be worth the money if it is the community archival 
aspect you are going for. Depending on your institution being a participant 
might make political/mission sense regardless of the storage needs and it could 
just be a specific collection that makes sense.

Glacier is a great choice if you are looking for spreading a backup across 
regions. S3 similarly if you also want to benefit from CloudFront (the CDN
setup) to take load off your institutions server (you can now use cloudfront 
off your own origin server as well). Depending on your bandwidth this might be 
worth the money regardless of LOCKSS participation (which can be more dark). 
Amazon also tends to be dropping prices over time vs raising but as any 
outsource you have to plan that it might not exist in the future. Also look 
more at Glacier prices in terms of checking your data for consistency. There 
have been a few papers on the costs of making sure Amazon really has the proper 
data depending on how often your requirements want you to check.

Another option if you are just looking for more geo placement is finding an 
institution or service provider that will colocate. There may be another small 
institution that would love to shove a cheap box with hard drives on your 
network in exchange for the same. Not as involved/formal as LOCKSS but gives 
you something you control to satisfy your requirements. It could also be as low 
tech as shipping SSDs to another institution who then runs some bagit checksums 
on the drive, etc.

All of the above should be scriptable in your workflow. Just need to decide 
what you really want out of it.

Eby


On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub <[email protected]> wrote:

> Hello Josh,
>
> Auburn University is a member of two Private LOCKSS Networks: the 
> MetaArchive Cooperative and the Alabama Digital Preservation Network 
> (ADPNet).  Here's a link to a recent conference paper that describes 
> both networks, including their current pricing structures:
>
> http://conference.ifla.org/past/ifla78/216-trehub-en.pdf
>
> LOCKSS has worked well for us so far, in part because supporting 
> community-based solutions is important to us.  As you point out, 
> however, Glacier is an attractive alternative, especially for 
> institutions that may be more interested in low-cost, low-throughput 
> storage and less concerned about entrusting their content to a 
> commercial outfit or having to pay extra to get it back out.  As with 
> most things, you pay your money--more or less, depending--and make your 
> choice.  And take your risks.
>
> Good luck with whatever solution(s) you decide on.  They need not be 
> mutually exclusive.
>
> Best,
>
> Aaron
>
> Aaron Trehub
> Assistant Dean for Technology and Technical Services Auburn University 
> Libraries
> 231 Mell Street, RBD Library
> Auburn, AL 36849-5606
> Phone: (334) 844-1716
> Skype: ajtrehub
> E-mail: [email protected]
> URL: http://lib.auburn.edu/
>
>

Reply via email to