Re: [CODE4LIB] Digital collection backups

Joshua Welker Fri, 11 Jan 2013 07:09:24 -0800

Thanks, Al. I think we'd join a LOCKSS network rather than run multiple LOCKSS 
boxes ourselves. Does anyone have any experience with one of those, like the 
LOCKSS Global Alliance?


Josh Welker


-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf Of Al 
Matthews
Sent: Friday, January 11, 2013 8:50 AM
To: [email protected]
Subject: Re: [CODE4LIB] Digital collection backups

We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is typically 
spec-d for consumer hardware, and so, presumably as a result of SE Asia 
flooding, there have been some drive failures and cache downtimes and 
adjustments accordingly.

However, that is the worst of it, first.

LOCKSS is to some perhaps even considerable degree, tamper-resistant since it 
relies on mechanisms of collective polling among multiple copies to preserve 
integrity. This, as opposed to static checksums or some other solution.

As such, it seems to me important to run a LOCKSS box with other LOCKSS boxes; 
MA cooperative specifies six or so, distributed locations for each cache.

The economic sustainability of such an enterprise is a valid question.
David S H Rosenthal at Stanford seems to lead the charge for this research.

e.g. http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more

I've heard mention from other players that they watch MA carefully for such 
sustainability considerations, especially because MA uses LOCKSS for 
non-journal content. In some sense this may extend LOCKSS beyond its original 
design.

MetaArchive has in my opinion been extremely responsible in designating 
succession scenarios and disaster recovery scenarios, going to far as to fund, 
develop and test services for migration out of the system, into an IRODS 
repository in the initial case.


Al Matthews
AUC Robert W. Woodruff Library

On 1/11/13 9:10 AM, "Joshua Welker" <[email protected]> wrote:

>Good point. But since campus IT will be creating regular 
>disaster-recovery backups, the odds that we'd need ever need to 
>retrieve more than a handful of files from Glacier at a time is pretty low.
>
>Josh Welker
>
>
>-----Original Message-----
>From: Code for Libraries [mailto:[email protected]] On Behalf Of 
>Gary McGath
>Sent: Friday, January 11, 2013 8:03 AM
>To: [email protected]
>Subject: Re: [CODE4LIB] Digital collection backups
>
>Concerns have been raised about how expensive Glacier gets if you need 
>to recover a lot of files in a short time period.
>
>http://www.wired.com/wiredenterprise/2012/08/glacier/
>
>On 1/10/13 5:56 PM, Roy Tennant wrote:
>> I'd also take a look at Amazon Glacier. Recently I parked about 50GB 
>> of data files in logical tar'd and gzip'd chunks and it's costing my 
>> employer less than 50 cents/month. Glacier, however, is best for 
>> "park it and forget" kinds of needs, as the real cost is in data flow.
>> Storage is cheap, but must be considered "offline" or "near line" as 
>> you must first request to retrieve a file, wait for about a day, and 
>> then retrieve the file. And you're charged more for the download 
>> throughput than just about anything.
>>
>> I'm using a Unix client to handle all of the heavy lifting of 
>> uploading and downloading, as Glacier is meant to be used via an API 
>> rather than a web client.[1] If anyone is interested, I have local 
>> documentation on usage that I could probably genericize. And yes, I 
>> did round-trip a file to make sure it functioned as advertised.
>> Roy
>>
>> [1] https://github.com/vsespb/mt-aws-glacier
>>
>> On Thu, Jan 10, 2013 at 2:29 PM,  <[email protected]>
>>wrote:
>>> We built our own solution for this by creating a plugin that works 
>>>with our digital asset management system (ResourceSpace) to 
>>>invidually back up files to Amazon S3. Because S3 is replicated to 
>>>multiple data centers, this provides a fairly high level of 
>>>redundancy. And because it's an object-based web service, we can 
>>>access any given object individually by using a URL related to the 
>>>original storage URL within our system.
>>>
>>> This also allows us to take advantage of S3 for images on our website.
>>>All of the images from in our online collections database are being 
>>>served straight from S3, which diverts the load from our public web 
>>>server. When we launch zoomable images later this year, all of the 
>>>tiles will also be generated locally in the DAM and then served to 
>>>the public via the mirrored copy in S3.
>>>
>>> The current pricing is around $0.08/GB/month for 1-50 TB, which I 
>>>think is fairly reasonable for what we're getting. They just dropped 
>>>the price substantially a few months ago.
>>>
>>> DuraCloud http://www.duracloud.org/ supposedly offers a way to add 
>>>another abstraction layer so you can build something like this that 
>>>is portable between different cloud storage providers. But I haven't 
>>>really looked into this as of yet.
>
>
>--
>Gary McGath, Professional Software Developer http://www.garymcgath.com


-----------------------------------------
**************************************************************************************************
The contents of this email and any attachments are confidential.
They are intended for the named recipient(s) only.
If you have received this email in error please notify the system manager or  
the sender immediately and do not disclose the contents to anyone or make 
copies.

** IronMail scanned this email for viruses, vandals and malicious content. **
**************************************************************************************************

Re: [CODE4LIB] Digital collection backups

Reply via email to