Hi Chuck...
I'm reviewing similar issues on a much smaller scale here at the Space
Centre and Museum here in Vancouver BC.
Here's some random thoughts on my efforts.
FYI: we're a non profit operating with minimal budgets and resources
here.
Enterprise class Solution Providers need not apply! :-)
(...open source hackers get free coffee and cookies)
Our main shared resource is a single shared directory tree which
contains everything from planetarium visuals to accounting's Excel
spreadsheets.
I use disk based backup with a 10 cartridge rotation. The entire tree is
backed up daily. Using commodity IDE (or SATA) hard disks is very cost
effective. Blows tape systems out of the water regarding cost, speed,
random access, flexibility. But a single tier system like this will
inevitably run out of space eventually, so I'm looking to develop a more
sophisticated model.
My current line of thinking is to retain the single tree for simplicity.
Users (all of them ...not just the technophiles) need to understand
something before they can use it. I'm intending to add a separate
archive area. This will be on a separate disk volume. The main directory
tree will be scanned nightly, and any file not even looked at for, say,
6 months will be moved to the archive in an identical directory path. It
will probably be made read-only. I may provide users direct access to
it, and that would stop them modifying the contents. I want that data
static.
Right now I'm thinking of maintaining 3 copies of the archive. That's a
big deal, as with the 10 cartridge rotation on the main directory tree,
we need 10 GB of media for every 1 GB of working space. That really
holds us back from exploiting cheap disk space to the fullest. With this
archive system, we'll only need 3, so all things being equal we'll have
3 (ok, 3.33...) times the archive space on the same hardware budget. The
three, rotating copies will be 1 online, 1 physically secure on-sight
and one in a safety deposit box (size 2) at the bank we do our cash run
with. I've also considered another step: as content is moved to the
archive,
A copy of the new stuff is buffered in a separate area.
When exactly one DVD's worth of stuff has arrived, it's burnt to DVD as
extra insurance. Not sure this step is worth the trouble. Only 3 copies
leaves me instinctively nervous when I'm accustomed to 10, but that is
purely psychology. I'm telling myself the chances of 3 drives failing
simultaneously must be remote (remember 2 offline, 1 offsite). Still, my
instincts aren't quite satisfied. Intellectually, I feel burning the
DVDs is less cost effective and less flexible than simply getting more
hard drives. Hard disks are hard to beat for $/GB and optical storage
never seems to catch up, though with each new generation of CD
technology it closes the gap for a while. Even 9.something GB on a
double layer disk isn't looking very big anymore (my cheap IDE disk
cartridges are 300GB). The labor and logistics of doing the DVD burn are
not welcome either. And of course optical disks are not famous for
reliable, long term stability that you'd bet your institution on.
I'm also considering another class of data: Extremely bulky data
Examples would be planetarium production files (can be HUGE) and
collections digitization and cataloging (I have a conservator who's very
busy with a shiny new digital camera right now). I'd really like to find
a storage solution that doesn't need 10 rotation copies as that would be
prohibitive given the size I want to achieve. But it has to be safely
backed up.
I'm considering maybe two mirrored copies online (different ends of the
building, UPSs etc), a third offline locally, and a fourth off-site. The
last two are essential to protect from a) a system-wide event and b)
destruction of the building(!).
The problem I've not answered yet is volume size. I want to use JABOD or
software raid to build big, easily scalable disk volumes with multiple,
cheap commodity disks. No problem for the online copies, but how that
could work for the offline and off-site copies is not obvious. Working
on it.
BTW: data size probably rules out online backup to an offsite service
provider due to bandwidth costs.
Hope that's interesting to some of you ...I'd love to share any ideas
the rest of you may have. Cheap, flexible and secure storage is an issue
many of us must be thinking about.
Regards,
David Marsh
==========================================
David Marsh
System Administrator
H.R. MacMillan Space Centre
Vancouver Museum
1100 Chestnut Street, Vancouver, BC V6J 3J9
E sysadmin at hrmacmillanspacecentre.com
sysadmin at vanmuseum.bc.ca
T (604) 736 4431 ext. 5507
C (604) 813 9667
===========================================
-----Original Message-----
From: mcn-l-bounces at mcn.edu [mailto:[email protected]] On Behalf Of
Chuck Patch
Sent: Monday, August 21, 2006 2:18 PM
To: mcn-l at mcn.edu
Subject: [MCN-L] File Storage Best Practices Redux
Last January someone posted a query asking about best practices for file
storage across the spectrum of applications run on their system. The
only reply related to the archival storage of images, but I don't
think that was the question and I find myself asking the same one now.
Have any of you defined policies for data classification at your
institutions? What types of priorities do you give to different types
of data? Do you have retention schedules for stored digital files (of
any type -- images, office productivity, etc.) How do you
partition/allocate your online storage? Do you give people set amounts
of "scratch space" to use at their own disgression?
We now have a big old EMC NAS with about 3 TB of space as well as a
variety of NAS and attached RAID units that will all be used within a
year if we don't start putting
limits on what gets parked there, so I'm very interested in real-world
experiences in developing policies and managing space.
Chuck Patch
The Historic New Orleans Collection
_______________________________________________
You are currently subscribed to mcn-l, the listserv of the Museum
Computer Network (http://www.mcn.edu)
To post to this list, send messages to: mcn-l at mcn.edu
To unsubscribe or change mcn-l delivery options visit:
http://toronto.mediatrope.com/mailman/listinfo/mcn-l