What:  Add on option on the storage deamon so it can automatically go offline
(eject the tape ie IOCTL offline command) when polling for another volume than
the one currently loaded (The scope is for single archive devices, not
autochanger).
(I know about the OfflineOnUnmout & AlwaysOpen options but they don't do the
trick... see below)

Why:   I need to implement a solution where the operator of the tape drive need
*_no interaction at all_* with bconsole and/or any other piece of program (not
because he is a stupid guy, but because this is not his job!).
       To achieve this, the conventions in use are as follows:
       Every morning, the operator goes to the archive system location (in fact
one different physical location by archive system for security reason)to check
the drive and eventualy make a tape change(this is done on a schedule basis
defined by the admin), and visually check drive's state:
         1/If the tape is still in the drive
           a/ & there is no activity LED on/blinking then the last backup(s)
were successfull (or not run at all, but in this case this is an admin stuff) so
 (if it is time to change the tape) push the eject button, take the tape and
store it in a safe place out-site, put and load the new tape in the drive.
           b/ & there is an activity LED on/blinking on the drive: backup in
progress, come back later...
           c/ & the error LED is on/blinking: call the admin
         2/If the tape is ejected from the drive:
           OOps, bacula SD need a new/other tape to finish its job(s) (possible
reasons: no more space left on the current tape, MaxUse and/or retention period
over, wrong tape inserted last time, ...) so call the admin to resolve this bad
issue.

       In other words, from the operator's point of vue, if the tape is ejected
from the drive then there was a big problem in the last backup so call
the admin.
       In fact, I need a way to implement a logical 3 states indicator {backup
ok, backup ko, need an other tape to finish my work} with only a phisical
matching 2 states device {== tape loaded into the device, tape ejected}. [And I
_don't want_ to write various AdminJobs / external cron scripts / ... to achieve
that (complexity ie sequencing external scripts with internal jobs, ... is
always the best solution .... for failure!)

This setup works well for me for about 1 month now in a pseudo-production
environement whith the following( all in v2.0.3 + modified stored sources):
  - about 50 client-FDs {Win32[XP,2kPro,2KSrv,NT4Srv],linux[Mandriva
Workstation, Mandriva servers(Lotus Domino, Samba, ...)]},
  - 1 Director (with a mysql catalog),
  - 2 Storage daemons,
  - at least 1 dayly incremental backup every nigth / client-FDs,
  - at least 1 weekly full backup every week / client-FDs,
  - 4 differents pools used,
  - 20 cocurrent jobs _and_ spooling data ON.
  The director and each storage deamon are phisically located on
separate servers.
  The tape drives are 1 DDS5 & 1 LTO2 (tested with sucess too whith various
'old' DDS2-4, LTO1 and DLT)
  The relevant part of the SD's configuration files are:
    Autochanger = no
    AutomaticMount = yes
    RemovableMedia = yes
    RandomAccess = no
    AlwaysOpen = no # needed for the tapealert/smartctl to work
    OfflineOnUnMount = no
    CloseOnPoll = no
    OfflineOnPoll = yes # <<== New option

This setup has been reconducted with v2.1.10 of the Director and (the modifed
source) Storage daemons for 1 week now with full success.
Intensive hard testing has been done (many volontary wrong tapes insertions,
inserting a different recycle/purge tape than the one guessed by the SD/Dir,
...) on the 2.0.3 version with apparently no problem/memory leak (at least not
any 'visible').

To accomplish this stuff, there is very little changes to the source code.

For those interested, here are the 'diffs' for the modified sources files (based
on the v2.1.10 source tree):
--------------------
./stored/dev.h-2.1.10.diff
118a119
> #define CAP_OFFLINEONPOLL  (1<<23)    /* Go OffLine when polling for a new
volume */

./stored/mount.c-2.1.10.diff
264a265,270
>          if ( dev->has_cap(CAP_OFFLINEONPOLL) &&
!(dev->has_cap(CAP_AUTOCHANGER)) )
>                 {
>                         Dmsg0(200, "Vol NAME ERROR && poll &&
CAP_OFFLINEONPOLL ==> going offline\n");
>                         dev->offline();
>                 }
>

./stored/stored_conf.c-2.1.10.diff
134a135
>    {"offlineonpoll",         store_bit,  ITEM(res_dev.cap_bits),
CAP_OFFLINEONPOLL, ITEM_DEFAULT, 0},
345a347,349
>       if (res->res_dev.cap_bits & CAP_OFFLINEONPOLL) {
>          bstrncat(buf, "CAP_OFFLINEONPOLL ", sizeof(buf));
>       }
--------------------That's all folks!

For Kern:
  - If you think that this new option is valuable for the bacula project, could
you please merge the diffs in a forthcoming version (not necessarily the next
one if you haven't got the time)?
  - For the new option CAP_OFFLINEONPOLL in dev.h, I arbitrarily took the first
next available bit; change this as you want.
  - A question that I can't answer (haven't got an autochanger to test it &
enough time to check the source code) is about the test in mount.c on
CAP_AUTOCHANGER: I don't thing that the option OfflineOnPoll have any interest
in the case of an autochanger (more espescialy when this autochanger is
controled via the mtx script!). I let you see if this test is ok and/or has any
side effect.
  - May be a possibly bad side effect (I'm definitly not a Bacula developper
guru so I haven't got the overall context / interactions in mind & haven't got
time to check it too !) is if you've got AlwaysOpen=yes _and_ OfflineOnPoll=yes:
don't know wath effect (if any) with the device struct and/or file descriptor.


Any question/remark welcome.

Marc.

PS: I don't currently subscribe to the -devel list for now, so this post is only
on the -user one; if anyone want to copy/follow-up it to the -devel list, he is
welcome!; if I really absolutly need to subscribe to the -devel list, well I
will... (plus cc me to tell me)!

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to