A while ago I asked people about their backup policies.  I am going to post all
of the responses I got here (with permission from the authors) in case anyone
else was interested.

Original posting:


>At the "umich.edu" cell, we are interested in how other folks have offered
>backup service to their user community.  This is our situation:
>
>- weekly full backups
>- daily incrementals
>
>We save a copy of the fulls permanently every 4 months and start over on the bu
>database.  We offer to restore volumes for users that request it.  The request
>is answered within 24 hours (the actual restore itself within 48 hours).  We
>offer this service to our user community even though our primary goal of
>backup is disaster recovery. 
>
>I'm really curious if other administrators have faster turn around policy for
>user restore requests, if they charge for them, if they don't provide them, etc.
>We are in the middle of doing a reality check to see if what we are offering is
>reasonable.
>
>Mark Giuffrida
>Univ of Michigan, ITD
>[EMAIL PROTECTED]




---------------------------------------------

My site has about 200 Suns, with about the same number of users.  The
user base is heavily weighted in favor of scientists, mathematicians,
and programmers.  We maintain about 40Gb of disk, mainly on 5 4/690
fileservers.  We're evaluating AFS, but haven't taken the plunge, so
this is all accessed via NFS and backed up with dump.  We generally
respond to a file restore request within an hour and have the file
restored within 4 hours.  If the user doesn't have a very good idea of
when the file was last modified, it takes longer because we need to
search several tapes.  If the file is on the previous night's daily
dump, the restore can happen within minutes since it's usually still
in one of the drives.

Considering the probable size of your user population and the fact
that the AFS backup system forces you to restore entire volumes, your
policy is probably reasonable.

Paul Allen
BCS Research & Technology Computing Environment
[EMAIL PROTECTED]

---------------------------------------------

i realize that commercial is different from the educational environment,
but wanted to give you another point of reference.

daily incrementals kept for 30 days
weekly fulls kept for 90 days
one weekly a month is called monthly and kept for 18 months
one monthly a quarter is called quaterly and kept "forever" offsite

restores are same day as long as the tapes are on site or within a week
if they are offsite.  currently we have the luxury of not having to
charge back.

        - krishnan
[EMAIL PROTECTED]

---------------------------------------------

We do daily incrementals, and monthly full backups.  The fulls are stored
offsite for up to a year.  We always have about 3 months of incrementals on
hand.  We do a shit load of restores, sometimes as many as 6 a day.  Most
are done in a couple of hours or so.

We have a special monitoring program that runs on all of our client
machines detecting hardware problems.  When we detect a hardware problem we
back the data up to disk; we avoid large restores from tape like crazy.

We also run AFS, so we occasionally do AFS restores. Not as many as our
so-called NFS backups :-).  In general, if a user asks for a file or
directory to be restored we can get it back within an hour or so.

We don't charge for them, I have one person who runs both AFS and NFS
backups. She does that bulk of the restores, with help when she gets overloaded.


Mike Dugan
Manager, Systems Management
Project Agora

---------------------------------------------

My cell is much smaller that yours (~1500 users, <15GB of AFS space), but
then there's only me to admin it...  In general, I manage a 24 hour turn
around on restore requests at no charge.  I do monthly fulls of all the
user data, weekly incrs off the monthies, and daily off the weeklies, so
data isn't around forever.  In addition, I take a full cell snapshot once a
month, which I keep for a year.  Not the best policy, but it works fairly
well.

Pat Wilson
Systems Manager, Unix Workstation Environment
Dartmouth College

---------------------------------------------


We offer the same level of service but our turnaround is slightly
better (if the request for a restore comes in overnight or up until
around 3p or 4p we can generally fulfill it the same day).

But our user base is probably smaller than yours (our is just
Computer Science, Math, Psychology, a student lab and two
University affiliated research centers).

[EMAIL PROTECTED] (H Morrow Long)

---------------------------------------------


We save for three months.  
nile 78% backup listdumps
/weekly  expires in  3m
    /monday  expires in  7d
    /tuesday  expires in  7d
    /wednesday  expires in  7d
    /thursday  expires in  7d
    /friday  expires in  7d
    /saturday  expires in  7d

I agree that the primary goal is disaster recover.  I also like redundancy
For the user volumes we also `vos dump' then compressing those files
and save an extra disk.  This way the volume can get restored quicker
since tape can take a while.  We also use the .backup volumes so they
can get it by themselves, if they catch it in time before the volumes get
created.

I think your parameters are totally reasonable.  We've given them others
tools(mass storage).  The help desk operators do restores, they are
honered within 24 hours.
Ral
[EMAIL PROTECTED] (Ral Geis )


---------------------------------------------

        Although we are fledglings in AFS, we have used backups in general
for a long time.  Our policy is to make an incremental of ALL user's files
every nite and a full backup once a week.  Systems files are backed up as
needed, since most are vendor files and the home-spun ones are placed in
mass-storage backup by the individual system developers.  There is no
direct charge for either backup or recovery services.  Recovery is usually
within eight hours, sometimes less and sometimes within twenty-four hours,
depending on the current backup schedule: i.e. the tape drive
availabilities.

        I for one, could NOT live without a backup system.  I've been saved
too many times from being lynched by user's after losing their files due to
either hardware or administative snafus.  i.e.  the user is guaranteed to
no longer be using our center and their files (many megabytes) are
worthless, so dump them.  Next day the user comes back from a three month
vacation, screaming for the files.  Or a directory transfer appears to have
been successful and the original space is cleared.  Then we find a
hardware/software glitch: half the files were corrupted with no warning and
no checksum.  The latter problem being caused by a perfectly good operating
system being "upgraded", resulting in lost checking features.  

        Better than backup, is a true Archive and Firemaster System.  This
is a system in which anything that goes in STAYS in until the tape fades
away.  Dead projects can be revised years later with all programs and data
intact; or recovered in the event of a fire or other disaster, although
that can take some time while hardware is replaced or alternate sites can
be arranged for.  Tape storage is very cheap.   Regenerating the project
programs and data would be very costly in both time and money, without such
a system.

Don Doering
[EMAIL PROTECTED] (Don Doering)

---------------------------------------------

The Advanced Laboratory Workstation Project (ALW) at the National Institutes 
of Health (alw.nih.gov) uses a somewhat more complex backup schedule. We divide
volumes into four categories:

1. Daily Incremental/Weekly Full -- user home volumes
                                 user data volumes
                                 software source volumes

2. Weekly Incremental/Monthly Full -- most system volumes
                                   some data volumes

3. Monthly Full -- large (gigabytes), stable databases

4. No backup -- log areas, temporary areas, other private volumes

The Daily Incremental/Weekly Full backup schedule consists of

   * 5 daily incremental tapes rewritten each week
   * 3 weekly full tapes rewritten each month
   * 2 monthly full tapes rewritten each quarter
   * 4 quarterly full tapes rewritten each year

The Weekly Incremental/Monthly Full backup schedule consists of

   * 3 weekly incremental tapes rewritten each month
   * 2 monthly full tapes rewritten each quarter
   * 4 quarterly full tapes rewritten each year

The Monthly Full backup schedule consists of

   * 2 monthly full tapes rewritten each quarter
   * 4 quarterly full tapes rewritten each year

In addition, as a Federal data site we are required to keep permanent
off-site copies of all data. The backups are performed by operations
staff who are on duty for sixteen hours, six days a week. (AFS backups
presently take about half that time).

ALW restore policy is to provide user data recovery within 48 hours.
File server disk recovery, of course, takes top priority and we use
a "triage" system to get critical volumes back on-line as soon as
possible; most "critical" volumes are user volumes, which are not replicated.

Since we have been using AFS over the past few years we have gotten
surprisingly few requests for data recovery from users. I suspect this
is because the majority of mishaps can be recovered by using the backup
clone. During that time we have restored 4 disk crashes. We have always been 
successful in recovering data after losing a disk.

ALW recently went onto "cost-recovery" under which we charge client 
workstations an annual connect fee and we charge users for disk space.
The rates are set to cover system services including client support,
server support, backups, and data recovery.

        Sandy Orlow (Systex, Inc.)      phone: (301) 496-5362
        Building 12A, Room 2023         uucp: uunet!sandy%alw.nih.gov
        National Institutes of Health   Internet: [EMAIL PROTECTED]
        Bethesda, MD 20892              FAX: (301) 402-2867




Reply via email to