I sent the following email to the bacula-devel list this morning, but again, it
seems not to have turned up. I have now added to this at the bottom.

On Thu, Jul 02, 2009 at 11:04:44AM +0200, Kern Sibbald wrote:
> One thing that I don't understand is the need to build the Win64 version, 
> unless you have changes that you have not sent to the project.  In that case, 
> I kindly ask you to tell me where I can find the changes so that I can 
> download them as the license requires.  We will help you as much as we can, 
> and you in turn should help the project :-)

Hello,
Yes, I have changes to do with making restore from multiple storage daemons
work. I have already emailed the patch file to the list.
You can find it here: http://markmail.org/message/t2te5k45qmbr7ssv

The last time this was talked about, you gave me a list of six 'rather trivial
but essential modifications' that you wanted me to do (which I am finally
getting round to doing):

> 1. You haven't followed the developers programming style guidelines
>      that are described in the developers manual.
> 
> 2. In addition to the guidelines documented, we do not use // for comments
>     except in column 1 to turn code off.  Please use C style comments.
> 
> 3. We do not in general return -1 to mean true and zero to mean false.  We use
>     true and false and a bool type.  The exception to this is sometimes some
>     very low level routines that mimic the Unix OS call convention.
> 
> 4. There aren't many comments.  In particular, you have changed a major
>      flow within Bacula, so that merits at least some comments so that
>      the overall concept of what you are implementing can be easily
>      understood.  I think the essential elements are in your email message
>      below, and it is just a matter of putting it in an appropriate place
>      in the code.
> 
> 5. We will also need some user documentation that explains the ramifications
>     of this change for users.
> 
> 6. Can you specify (email or possibly in the user documentation) what happens
>    if there is an error with one of the storage daemons -- i.e. does
>    everything shutdown correctly, and is it clear to the user which storage
>    daemon had the problem?
> 
> I would appreciate it if you would make the above rather trivial but essential
> modifications to your patch and then resend it to me.


(A day later)...
The attached patch addresses points 1-4 above and applies to the bacula-3.0.1
source.

> 5. We will also need some user documentation that explains the ramifications
>   of this change for users.

I don't know what you mean by 'some user documentation' - whether it should be
a patch to something, for example.
But here is my explanation of the ramifications:

These changes mean that you can restore from volumes held by multiple storage
daemons - for example, SD1 has a Full backup and SD2 has your incrementals.
Previously, in this situation, the file daemon would just try to use a single
storage daemon and would get stuck when it got to a volume that the storage
daemon knew nothing about.
If you want the new functionality, you will need to update your director and
your file daemon (there are no changes to the storage daemon).
If you update the director, you do not need to update your file daemon - the
old file daemon will carry on working as before.

6. Can you specify (email or possibly in the user documentation) what happens
   if there is an error with one of the storage daemons -- i.e. does
   everything shutdown correctly, and is it clear to the user which storage
   daemon had the problem?

It's not very much different from what happened before.
After the director sends a bootstrap file to the file daemon, the director does
a wait_for_storage_daemon_termination().
Then, once that returns, if jcr->SDJobStatus != JS_Terminated, the director
bails out of its restore loop, then cleans up the FD stuff.
As the process will stop with the last Storage daemon that failed, the job
report that gets sent/printed out at the end will state that Storage daemon,
so it should be pretty obvious to the user which SD had the problem.
Index: WORK/src/dird/backup.c
===================================================================
RCS file: /cvs/netpilot/GPL/bacula-3.0.1/WORK/src/dird/backup.c,v
retrieving revision 1.1
diff -u -r1.1 backup.c
--- WORK/src/dird/backup.c	16 Jun 2009 15:14:46 -0000	1.1
+++ WORK/src/dird/backup.c	2 Jul 2009 16:29:15 -0000
@@ -47,7 +47,7 @@
 
 /* Commands sent to File daemon */
 static char backupcmd[] = "backup\n";
-static char storaddr[]  = "storage address=%s port=%d ssl=%d\n";
+static char storaddr[]  = "storage address=%s port=%d ssl=%d Authorization=%s\n";
 
 /* Responses received from File daemon */
 static char OKbackup[]   = "2000 OK backup\n";
@@ -278,7 +278,8 @@
       }
    }
 
-   fd->fsend(storaddr, store->address, store->SDDport, tls_need);
+   fd->fsend(storaddr, store->address, store->SDDport, tls_need,
+      jcr->sd_auth_key);
    if (!response(jcr, fd, OKstore, "Storage", DISPLAY_ERROR)) {
       goto bail_out;
    }
@@ -339,11 +340,10 @@
 
 /*
  * Here we wait for the File daemon to signal termination,
- *   then we wait for the Storage daemon.  When both
+ *   then we wait for the Storage daemon, if asked to.  When both
  *   are done, we return the job status.
- * Also used by restore.c
  */
-int wait_for_job_termination(JCR *jcr, int timeout)
+static int wait_for_job_termination_do(JCR *jcr, int timeout, bool storage_too)
 {
    int32_t n = 0;
    BSOCK *fd = jcr->file_bsock;
@@ -397,7 +397,9 @@
    }
 
    /* Note, the SD stores in jcr->JobFiles/ReadBytes/JobBytes/JobErrors */
-   wait_for_storage_daemon_termination(jcr);
+   if(storage_too) {
+      wait_for_storage_daemon_termination(jcr);
+   }
 
    /* Return values from FD */
    if (fd_ok) {
@@ -425,7 +427,28 @@
    if (jcr->FDJobStatus != JS_Terminated) {
       return jcr->FDJobStatus;
    }
-   return jcr->SDJobStatus;
+   if(storage_too) {
+      return jcr->SDJobStatus;
+   }
+   return jcr->FDJobStatus;
+}
+
+/* This waits for both the fd and the sd to finish.
+   When both are done, we return the job status.
+   Also used by restore.c and verify.c */
+int wait_for_job_termination(JCR *jcr, int timeout)
+{
+   return wait_for_job_termination_do(jcr, timeout,
+      true /* wait for storage termination too */ );
+}
+
+/* This waits for only the fd to finish.
+   When done, we return the job status.
+   Also used by restore.c */
+int wait_for_job_termination_fd(JCR *jcr, int timeout)
+{
+   return wait_for_job_termination_do(jcr, timeout,
+      false /* do not wait for storage termination too */ );
 }
 
 /*
Index: WORK/src/dird/bsr.c
===================================================================
RCS file: /cvs/netpilot/GPL/bacula-3.0.1/WORK/src/dird/bsr.c,v
retrieving revision 1.1
diff -u -r1.1 bsr.c
--- WORK/src/dird/bsr.c	16 Jun 2009 15:14:46 -0000	1.1
+++ WORK/src/dird/bsr.c	2 Jul 2009 16:29:15 -0000
@@ -353,6 +353,11 @@
                find_storage_resource(ua, rx, bsr->VolParams[i].Storage,
                                              bsr->VolParams[i].MediaType);
             }
+            /* Storage daemons do not understand Storage= lines. Need to
+               put them in here so that the director can break up the bootstrap
+               file into multiple parts to send to the fd when restoring from
+               multiple sds. They are stripped out when sent to the fd. */
+            fprintf(fd, "Storage=\"%s\"\n", bsr->VolParams[i].Storage);
             fprintf(fd, "Volume=\"%s\"\n", bsr->VolParams[i].VolumeName);
             fprintf(fd, "MediaType=\"%s\"\n", bsr->VolParams[i].MediaType);
             if (bsr->fileregex) {
@@ -409,6 +414,11 @@
                find_storage_resource(ua, rx, bsr->VolParams[i].Storage,
                                              bsr->VolParams[i].MediaType);
             }
+            /* Storage daemons do not understand Storage= lines. Need to
+               put them in here so that the director can break up the bootstrap
+               file into multiple parts to send to the fd when restoring from
+               multiple sds. They are stripped out when sent to the fd. */
+            fprintf(fd, "Storage=\"%s\"\n", bsr->VolParams[i].Storage);
             fprintf(fd, "Volume=\"%s\"\n", bsr->VolParams[i].VolumeName);
             fprintf(fd, "MediaType=\"%s\"\n", bsr->VolParams[i].MediaType);
             if (bsr->fileregex) {
Index: WORK/src/dird/fd_cmds.c
===================================================================
RCS file: /cvs/netpilot/GPL/bacula-3.0.1/WORK/src/dird/fd_cmds.c,v
retrieving revision 1.1
diff -u -r1.1 fd_cmds.c
--- WORK/src/dird/fd_cmds.c	16 Jun 2009 15:14:46 -0000	1.1
+++ WORK/src/dird/fd_cmds.c	2 Jul 2009 16:29:15 -0000
@@ -46,9 +46,13 @@
 
 const int dbglvl = 400;
 
-/* Commands sent to File daemon */
+/* Commands sent to File daemon
+   Old versions of the file daemon except to receive an sd_auth_key with
+   the jobcmd[]. Newer versions (FDVersion >= 2) expect it in a storaddr[] cmd,
+   so that they can connect to more than one storage daemon when restoring. */
 static char filesetcmd[]  = "fileset%s\n"; /* set full fileset */
-static char jobcmd[]      = "JobId=%s Job=%s SDid=%u SDtime=%u Authorization=%s\n";
+static char jobcmd[]      = "JobId=%s Job=%s SDid=%u SDtime=%u\n";
+static char jobcmd_legacy[] = "JobId=%s Job=%s SDid=%u SDtime=%u Authorization=%s\n";
 /* Note, mtime_only is not used here -- implemented as file option */
 static char levelcmd[]    = "level = %s%s%s mtime_only=%d\n";
 static char runscript[]   = "Run OnSuccess=%u OnFailure=%u AbortOnError=%u When=%u Command=%s\n";
@@ -114,13 +118,25 @@
    }
 
    /*
-    * Now send JobId and authorization key
+    * Now send JobId and maybe an authorization key.
+    *
+    * Newer versions receive the sd_auth_key in a storaddr[] command - one
+    * for each sd. Older versions only know how to accept the key in a
+    * jobcmd[].
     */
-   fd->fsend(jobcmd, edit_int64(jcr->JobId, ed1), jcr->Job, jcr->VolSessionId,
-      jcr->VolSessionTime, jcr->sd_auth_key);
-   if (strcmp(jcr->sd_auth_key, "dummy") != 0) {
-      memset(jcr->sd_auth_key, 0, strlen(jcr->sd_auth_key));
+   if (jcr->FDVersion >= 2) {
+      fd->fsend(jobcmd, edit_int64(jcr->JobId, ed1), jcr->Job,
+         jcr->VolSessionId, jcr->VolSessionTime);
+   }
+   else {
+      /* jcr->FDVersion == 0, or 1 */
+      fd->fsend(jobcmd_legacy, edit_int64(jcr->JobId, ed1), jcr->Job,
+         jcr->VolSessionId, jcr->VolSessionTime, jcr->sd_auth_key);
+      if (strcmp(jcr->sd_auth_key, "dummy") != 0) {
+         memset(jcr->sd_auth_key, 0, strlen(jcr->sd_auth_key));
+      }
    }
+
    Dmsg1(100, ">filed: %s", fd->msg);
    if (bget_dirmsg(fd) > 0) {
        Dmsg1(110, "<filed: %s", fd->msg);
@@ -539,7 +555,10 @@
    }
    sock->fsend(bootstrap);
    while (fgets(buf, sizeof(buf), bs)) {
-      sock->fsend("%s", buf);
+      /* Storage daemons do not understand 'Storage=' lines! */
+      if (strncmp(buf, "Storage=", strlen("Storage="))) {
+         sock->fsend("%s", buf);
+      }
    }
    sock->signal(BNET_EOD);
    fclose(bs);
Index: WORK/src/dird/protos.h
===================================================================
RCS file: /cvs/netpilot/GPL/bacula-3.0.1/WORK/src/dird/protos.h,v
retrieving revision 1.1
diff -u -r1.1 protos.h
--- WORK/src/dird/protos.h	16 Jun 2009 15:14:46 -0000	1.1
+++ WORK/src/dird/protos.h	2 Jul 2009 16:29:15 -0000
@@ -52,7 +52,8 @@
 extern bool find_recycled_volume(JCR *jcr, bool InChanger, MEDIA_DBR *mr);
 
 /* backup.c */
-extern int wait_for_job_termination(JCR *jcr, int timeout=0);
+extern int wait_for_job_termination(JCR *jcr, int timeout = 0);
+extern int wait_for_job_termination_fd(JCR *jcr, int timeout = 0);
 extern bool do_backup_init(JCR *jcr);
 extern bool do_backup(JCR *jcr);
 extern void backup_cleanup(JCR *jcr, int TermCode);
Index: WORK/src/dird/restore.c
===================================================================
RCS file: /cvs/netpilot/GPL/bacula-3.0.1/WORK/src/dird/restore.c,v
retrieving revision 1.1
diff -u -r1.1 restore.c
--- WORK/src/dird/restore.c	16 Jun 2009 15:14:46 -0000	1.1
+++ WORK/src/dird/restore.c	2 Jul 2009 16:29:15 -0000
@@ -52,54 +52,64 @@
 /* Commands sent to File daemon */
 static char restorecmd[]  = "restore replace=%c prelinks=%d where=%s\n";
 static char restorecmdR[] = "restore replace=%c prelinks=%d regexwhere=%s\n";
-static char storaddr[]    = "storage address=%s port=%d ssl=0\n";
+static char endrestorecmd[] = "endrestore\n";
+static char storaddr[]    = "storage address=%s port=%d ssl=0 Authorization=%s\n";
 
 /* Responses received from File daemon */
 static char OKrestore[]   = "2000 OK restore\n";
 static char OKstore[]     = "2000 OK storage\n";
+static char OKstoreend[]  = "2000 OK storage end\n";
 static char OKbootstrap[] = "2000 OK bootstrap\n";
 
-/*
- * Do a restore of the specified files
- *
- *  Returns:  0 on failure
- *            1 on success
- */
-bool do_restore(JCR *jcr)
+/* Sends the contiguous bootstrap segments for a particular storage daemon.
+   Leaves 'FILE *bs' at the beginning of the next segment. */
+static bool send_partial_bootstrap_file(JCR *jcr, BSOCK *sock, const char *storage, FILE *bs)
 {
-   BSOCK   *fd;
-   JOB_DBR rjr;                       /* restore job record */
-   char replace, *where, *cmd;
-   char empty = '\0';
-   int stat;
+   fpos_t pos;
+   char buf[1000];
+   const char *bootstrap = "bootstrap\n";
+   UAContext *ua = NULL;
 
-   free_wstorage(jcr);                /* we don't write */
-
-   if (!allow_duplicate_job(jcr)) {
+   Dmsg1(400, "send_bootstrap_file: %s\n", jcr->RestoreBootstrap);
+   if (!jcr->RestoreBootstrap) {
       goto bail_out;
    }
+   sock->fsend(bootstrap);
+   ua = new_ua_context(jcr);
+   while(!fgetpos(bs, &pos) && fgets(buf, sizeof(buf), bs)) {
+      Mmsg(ua->cmd, buf);
+      parse_ua_args(ua);
+      if(ua->argc != 1) {
+         continue;
+      }
+      if(ua->argk[0] && !strcasecmp(ua->argk[0], "Storage")) {
+         /* Continue if this is a volume from the same storage. */
+         if(ua->argv[0] && !strcmp(ua->argv[0], storage)) {
+            continue;
+         }
+         /* Otherwise, we need to contact another storage daemon.
+            Reset bs to the beginning of the next segment. */
+         fsetpos(bs, &pos);
+         break;
+      }
 
-   memset(&rjr, 0, sizeof(rjr));
-   jcr->jr.JobLevel = L_FULL;         /* Full restore */
-   if (!db_update_job_start_record(jcr, jcr->db, &jcr->jr)) {
-      Jmsg(jcr, M_FATAL, 0, "%s", db_strerror(jcr->db));
-      goto bail_out;
+      sock->fsend("%s", buf);
    }
-   Dmsg0(20, "Updated job start record\n");
-
-   Dmsg1(20, "RestoreJobId=%d\n", jcr->job->RestoreJobId);
-
-   if (!jcr->RestoreBootstrap) {
-      Jmsg0(jcr, M_FATAL, 0, _("Cannot restore without a bootstrap file.\n"
-          "You probably ran a restore job directly. All restore jobs must\n"
-          "be run using the restore command.\n"));
-      goto bail_out;
+   free_ua_context(ua);
+   sock->signal(BNET_EOD);
+   if (jcr->unlink_bsr) {
+      unlink(jcr->RestoreBootstrap);
+      jcr->unlink_bsr = false;
    }
+   return true;
+bail_out:
+   return false;
+}
 
-
-   /* Print Job Start message */
-   Jmsg(jcr, M_INFO, 0, _("Start Restore Job %s\n"), jcr->Job);
-
+/* Starts conversation with a Storage daemon, starts a job with it, and
+   starts a Storage daemon message thread. */
+static bool start_storage_daemon(JCR *jcr)
+{
    /*
     * Open a message channel connection with the Storage
     * daemon. This is to let him know that our client
@@ -131,18 +141,14 @@
    }
    Dmsg0(50, "Storage daemon connection OK\n");
 
+   return true;
+bail_out:
+   return false;
+}
 
-   /*
-    * Start conversation with File daemon
-    */
-   set_jcr_job_status(jcr, JS_WaitFD);
-   if (!connect_to_file_daemon(jcr, 10, FDConnectTimeout, 1)) {
-      goto bail_out;
-   }
-
-   fd = jcr->file_bsock;
-   set_jcr_job_status(jcr, JS_Running);
-
+static bool wait_for_fd_connect_to_sd(JCR *jcr)
+{
+   BSOCK   *fd;
    /*
     * send Storage daemon address to the File daemon,
     *   then wait for File daemon to make connection
@@ -151,26 +157,29 @@
    if (jcr->rstore->SDDport == 0) {
       jcr->rstore->SDDport = jcr->rstore->SDport;
    }
-   fd->fsend(storaddr, jcr->rstore->address, jcr->rstore->SDDport);
+   fd = jcr->file_bsock;
+   /* FDVersion >= 2 understand the sd_auth_key turning up in the storaddr
+      command.
+      Older versions will just ignore the extra field on the end. */
+   fd->fsend(storaddr,
+      jcr->rstore->address, jcr->rstore->SDDport, jcr->sd_auth_key);
+   if(jcr->sd_auth_key) {
+      bfree(jcr->sd_auth_key);
+      jcr->sd_auth_key = NULL;
+   }
    Dmsg1(6, "dird>filed: %s\n", fd->msg);
    if (!response(jcr, fd, OKstore, "Storage", DISPLAY_ERROR)) {
-      goto bail_out;
-   }
-
-   /*
-    * Send the bootstrap file -- what Volumes/files to restore
-    */
-   if (!send_bootstrap_file(jcr, fd) ||
-       !response(jcr, fd, OKbootstrap, "Bootstrap", DISPLAY_ERROR)) {
-      goto bail_out;
-   }
-
-
-   if (!send_runscripts_commands(jcr)) {
-      goto bail_out;
+      return false;
    }
+   return true;
+}
 
-   /* Send restore command */
+/* Send restore command and wait for the storage daemon to finish. */
+static bool send_restore_command(JCR *jcr)
+{
+   BSOCK   *fd;
+   char empty = '\0';
+   char replace, *where, *cmd;
 
    if (jcr->replace != 0) {
       replace = jcr->replace;
@@ -202,6 +211,7 @@
    jcr->prefix_links = jcr->job->PrefixLinks;
 
    bash_spaces(where);
+   fd = jcr->file_bsock;
    fd->fsend(cmd, replace, jcr->prefix_links, where);
    unbash_spaces(where);
 
@@ -209,13 +219,284 @@
       goto bail_out;
    }
 
-   /* Wait for Job Termination */
-   stat = wait_for_job_termination(jcr);
+   /* Need to wait for the storage daemon to finish before returning and
+      moving onto the next storage daemon, if any.
+      Note, the SD stores in jcr->JobFiles/ReadBytes/JobBytes/JobErrors */
+   wait_for_storage_daemon_termination(jcr);
+
+   return true;
+bail_out:
+   return false;
+}
+
+/* Called for each time round the loop in loop_bootstrap(), once for each
+   contiguous piece of bootstrap file with the same storage.
+   Sends the partial bootstrap file and then the restore command to the fd. */
+static bool do_storage(JCR *jcr, const char *storage, FILE *bs)
+{
+   BSOCK   *fd;
+
+   if(!jcr->store_bsock && !start_storage_daemon(jcr)) {
+      goto bail_out;
+   }
+
+   if(!wait_for_fd_connect_to_sd(jcr)) {
+      goto bail_out;
+   }
+
+   /*
+    * Send the bootstrap file -- what Volumes/files to restore
+    */
+   fd = jcr->file_bsock;
+   if (!send_partial_bootstrap_file(jcr, fd, storage, bs) ||
+       !response(jcr, fd, OKbootstrap, "Bootstrap", DISPLAY_ERROR)) {
+      goto bail_out;
+   }
+
+   if(!send_restore_command(jcr)) {
+      goto bail_out;
+   }
+
+   /* storage daemon status is in jcr->SDJobStatus; */
+
+   return true;
+bail_out:
+   return false;
+}
+
+/* Legacy, for FDVersion < 2, which expects the whole bootstrap file to be sent
+   and cannot deal with multiple storages. */
+static bool single_bootstrap(JCR *jcr)
+{
+   BSOCK   *fd;
+   /* storage daemon already started */
+
+   if(!wait_for_fd_connect_to_sd(jcr)) {
+      goto bail_out;
+   }
+
+   /*
+    * Send the bootstrap file -- what Volumes/files to restore
+    */
+   fd = jcr->file_bsock;
+   if (!send_bootstrap_file(jcr, fd) ||
+       !response(jcr, fd, OKbootstrap, "Bootstrap", DISPLAY_ERROR)) {
+      goto bail_out;
+   }
+
+   if(!send_restore_command(jcr)) {
+      return false;
+   }
+
+   return true;
+bail_out:
+   return false;
+}
+
+/* Go through the bootstrap file, picking out the individual storages for the
+   purpose of asking the file daemon to restore from them one by one. */
+static bool loop_bootstrap(JCR *jcr)
+{
+   FILE *bs = NULL;
+   char buf[1000];
+   int scount = 0;
+   UAContext *ua = new_ua_context(jcr);
+   STORE *rstore = jcr->rstore;
+   bs = fopen(jcr->RestoreBootstrap, "rb");
+   if (!bs) {
+      berrno be;
+      Jmsg(jcr, M_FATAL, 0, _("Could not open bootstrap file %s: ERR=%s\n"),
+         jcr->RestoreBootstrap, be.bstrerror());
+      set_jcr_job_status(jcr, JS_ErrorTerminated);
+      goto bail_out;
+   }
+
+   while(fgets(buf, sizeof(buf), bs)) {
+      USTORE ustore;
+      Mmsg(ua->cmd, buf);
+      parse_ua_args(ua);
+      if(ua->argc != 1
+        || (ua->argk[0] && strcasecmp(ua->argk[0], "Storage"))
+        || !ua->argv[0] || !*(ua->argv[0])) {
+         continue;
+      }
+
+      /* We should already have connected to the first storage via the old
+         method of connecting to the storage daemon first, before connecting
+         to the file daemon.
+         So, only set up a new storage if we are past the first one.
+      */
+      if(scount > 0) {
+         if(!(ustore.store = (STORE *)GetResWithName(R_STORAGE, ua->argv[0]))) {
+            Jmsg(jcr, M_FATAL, 0,
+               _("Could not get storage resource '%s'.\n"), ua->argv[0]);
+            set_jcr_job_status(jcr, JS_ErrorTerminated);
+            break;
+         }
+
+         free_rstorage(jcr);
+         set_rstorage(jcr, &ustore);
+
+         if(jcr->store_bsock) {
+             jcr->store_bsock->destroy();
+             jcr->store_bsock = NULL;
+         }
+      }
+
+      /* do_storage() will restore the files needed from this particular
+         storage */      
+      if(!do_storage(jcr, ua->argv[0], bs)) {
+         Jmsg(jcr, M_FATAL, 0,
+            _("Restoring from storage '%s' failed.\n"), ua->argv[0]);
+         set_jcr_job_status(jcr, JS_ErrorTerminated);
+         break;
+      }
+
+      /* storage daemon status is in jcr->SDJobStatus; */
+      if (jcr->SDJobStatus != JS_Terminated) {
+         goto bail_out;
+      }
+
+      /* tell the fd that this is the end of this storage */
+      if (!response(jcr, jcr->file_bsock,
+       OKstoreend, "Storage end", DISPLAY_ERROR)) {
+         goto bail_out;
+      }
+
+      scount++;
+   }
+
+   if(bs) {
+      fclose(bs);
+   }
+   free_ua_context(ua);
+   jcr->rstore = rstore;
+
+   return true;
+bail_out:
+   if(bs) {
+      fclose(bs);
+   }
+   free_ua_context(ua);
+   jcr->rstore = rstore;
+
+   return false;
+}
+
+/*
+ * Do a restore of the specified files
+ *
+ *  Returns:  0 on failure
+ *            1 on success
+ *
+ * Here is what happens:
+ * The director connects to the file daemon.
+ * Then (if FDVersion >= 2), for each storage daemon in the .bsr file... {
+ *    The director connects to the storage daemon, and gets an
+ *    sd_auth_key. The director then connects to the file daemon, and gives
+ *    it the sd_auth_key with the 'storaddr' command.
+ *    (restoring of files happens)
+ *    The director does a 'wait_for_storage_daemon_termination()'.
+ *    The director waits for the file daemon to indicate the end of the
+ *    work on this storage.
+ * }
+ * (If FDversion<2, the whole bootstrap file is sent to the fd, as it expects)
+ * The director tells the file daemon that there are no more storages to
+ * contact. The director waits for the file daemon to indicate the end of
+ * the job.
+ */
+bool do_restore(JCR *jcr)
+{
+   JOB_DBR rjr;                       /* restore job record */
+   int stat;
+
+   free_wstorage(jcr);                /* we don't write */
+
+   if (!allow_duplicate_job(jcr)) {
+      goto bail_out;
+   }
+
+   memset(&rjr, 0, sizeof(rjr));
+   jcr->jr.JobLevel = L_FULL;         /* Full restore */
+   if (!db_update_job_start_record(jcr, jcr->db, &jcr->jr)) {
+      Jmsg(jcr, M_FATAL, 0, "%s", db_strerror(jcr->db));
+      goto bail_out;
+   }
+   Dmsg0(20, "Updated job start record\n");
+
+   Dmsg1(20, "RestoreJobId=%d\n", jcr->job->RestoreJobId);
+
+   if (!jcr->RestoreBootstrap) {
+      Jmsg0(jcr, M_FATAL, 0, _("Cannot restore without a bootstrap file.\n"
+          "You probably ran a restore job directly. All restore jobs must\n"
+          "be run using the restore command.\n"));
+      goto bail_out;
+   }
+
+   /* Print Job Start message */
+   Jmsg(jcr, M_INFO, 0, _("Start Restore Job %s\n"), jcr->Job);
+
+   /* Old file daemons needed us to connect to the SD first, and get a
+      jcr->sd_auth_key. */
+   if(!start_storage_daemon(jcr)) {
+         goto bail_out;
+   }
+
+   /*
+    * Start conversation with File daemon
+    */
+   set_jcr_job_status(jcr, JS_WaitFD);
+   if (!connect_to_file_daemon(jcr, 10, FDConnectTimeout, 1)) {
+      goto bail_out;
+   }
+   /* Once we have connected to the file daemon, we know what jcr->FDVersion
+      it is. */
+
+   set_jcr_job_status(jcr, JS_Running);
+
+   if (!send_runscripts_commands(jcr)) {
+      goto bail_out;
+   }
+
+   if (jcr->FDVersion >= 2) {
+      if (!loop_bootstrap(jcr)) {
+         goto bail_out;
+      }
+      if (jcr->SDJobStatus != JS_Terminated) {
+         goto bail_out;
+      }
+
+      /* Tell the file daemon that there is nothing more to do. */
+      jcr->file_bsock->fsend(endrestorecmd);
+
+      /* Wait for Job Termination */
+      stat = wait_for_job_termination_fd(jcr);
+   } else {
+      /* Legacy */
+      if (!single_bootstrap(jcr)) {
+         goto bail_out;
+      }
+      /* Wait for Job Termination */
+      stat = wait_for_job_termination(jcr);
+   }
+
    restore_cleanup(jcr, stat);
    return true;
 
 bail_out:
-   restore_cleanup(jcr, JS_ErrorTerminated);
+   /* Need to make sure the FD is cleaned up */
+   BSOCK *fd = jcr->file_bsock;
+   if(jcr->file_bsock) {
+      fd->fsend("cancel Job=%s\n", jcr->Job);
+      while (fd->recv() >= 0) {
+      }
+      fd->signal(BNET_TERMINATE);
+      fd->close();
+      jcr->file_bsock = NULL;
+   }
+
+   /* caller ends up doing its own restore_cleanup() when returning false.
+   restore_cleanup(jcr, JS_ErrorTerminated); */
    return false;
 }
 
Index: WORK/src/dird/verify.c
===================================================================
RCS file: /cvs/netpilot/GPL/bacula-3.0.1/WORK/src/dird/verify.c,v
retrieving revision 1.1
diff -u -r1.1 verify.c
--- WORK/src/dird/verify.c	16 Jun 2009 15:14:46 -0000	1.1
+++ WORK/src/dird/verify.c	2 Jul 2009 16:29:15 -0000
@@ -48,7 +48,7 @@
 
 /* Commands sent to File daemon */
 static char verifycmd[]    = "verify level=%s\n";
-static char storaddr[]     = "storage address=%s port=%d ssl=0\n";
+static char storaddr[]     = "storage address=%s port=%d ssl=0 Authorization=%s\n";
 
 /* Responses received from File daemon */
 static char OKverify[]    = "2000 OK verify\n";
@@ -267,7 +267,8 @@
       if (jcr->rstore->SDDport == 0) {
          jcr->rstore->SDDport = jcr->rstore->SDport;
       }
-      bnet_fsend(fd, storaddr, jcr->rstore->address, jcr->rstore->SDDport);
+      bnet_fsend(fd, storaddr, jcr->rstore->address, jcr->rstore->SDDport,
+         jcr->sd_auth_key);
       if (!response(jcr, fd, OKstore, "Storage", DISPLAY_ERROR)) {
          goto bail_out;
       }
Index: WORK/src/filed/authenticate.c
===================================================================
RCS file: /cvs/netpilot/GPL/bacula-3.0.1/WORK/src/filed/authenticate.c,v
retrieving revision 1.1
diff -u -r1.1 authenticate.c
--- WORK/src/filed/authenticate.c	16 Jun 2009 15:14:46 -0000	1.1
+++ WORK/src/filed/authenticate.c	2 Jul 2009 16:29:15 -0000
@@ -42,8 +42,9 @@
 /* Version at end of Hello
  *   prior to 10Mar08 no version
  *   1 10Mar08
+ *   2 13Mar09 - added the ability to restore from multiple storages
  */
-static char OK_hello[]  = "2000 OK Hello 1\n";
+static char OK_hello[]  = "2000 OK Hello 2\n";
 static char Dir_sorry[] = "2999 Authentication failed.\n";
 static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
 
Index: WORK/src/filed/job.c
===================================================================
RCS file: /cvs/netpilot/GPL/bacula-3.0.1/WORK/src/filed/job.c,v
retrieving revision 1.1
diff -u -r1.1 job.c
--- WORK/src/filed/job.c	16 Jun 2009 15:14:46 -0000	1.1
+++ WORK/src/filed/job.c	2 Jul 2009 16:29:15 -0000
@@ -63,6 +63,7 @@
 static int level_cmd(JCR *jcr);
 static int verify_cmd(JCR *jcr);
 static int restore_cmd(JCR *jcr);
+static int end_restore_cmd(JCR *jcr);
 static int storage_cmd(JCR *jcr);
 static int session_cmd(JCR *jcr);
 static int response(JCR *jcr, BSOCK *sd, char *resp, const char *cmd);
@@ -97,6 +98,7 @@
    {"JobId=",       job_cmd,       0},
    {"level = ",     level_cmd,     0},
    {"restore",      restore_cmd,   0},
+   {"endrestore",   end_restore_cmd, 0},
    {"session",      session_cmd,   0},
    {"status",       status_cmd,    1},
    {".status",      qstatus_cmd,   1},
@@ -112,8 +114,8 @@
 };
 
 /* Commands received from director that need scanning */
-static char jobcmd[]      = "JobId=%d Job=%127s SDid=%d SDtime=%d Authorization=%100s";
-static char storaddr[]    = "storage address=%s port=%d ssl=%d";
+static char jobcmd[]      = "JobId=%d Job=%127s SDid=%d SDtime=%d";
+static char storaddr[]    = "storage address=%s port=%d ssl=%d Authorization=%100s";
 static char sessioncmd[]  = "session %127s %ld %ld %ld %ld %ld %ld\n";
 static char restorecmd[]  = "restore replace=%c prelinks=%d where=%s\n";
 static char restorecmd1[] = "restore replace=%c prelinks=%d where=\n";
@@ -137,6 +139,7 @@
 static char OKrestore[]   = "2000 OK restore\n";
 static char OKsession[]   = "2000 OK session\n";
 static char OKstore[]     = "2000 OK storage\n";
+static char OKstoreend[]  = "2000 OK storage end\n";
 static char OKjob[]       = "2000 OK Job %s (%s) %s,%s,%s";
 static char OKsetdebug[]  = "2000 OK setdebug=%d\n";
 static char BADjob[]      = "2901 Bad Job\n";
@@ -457,21 +460,15 @@
 static int job_cmd(JCR *jcr)
 {
    BSOCK *dir = jcr->dir_bsock;
-   POOLMEM *sd_auth_key;
 
-   sd_auth_key = get_memory(dir->msglen);
    if (sscanf(dir->msg, jobcmd,  &jcr->JobId, jcr->Job,
-              &jcr->VolSessionId, &jcr->VolSessionTime,
-              sd_auth_key) != 5) {
+              &jcr->VolSessionId, &jcr->VolSessionTime) != 4) {
       pm_strcpy(jcr->errmsg, dir->msg);
       Jmsg(jcr, M_FATAL, 0, _("Bad Job Command: %s"), jcr->errmsg);
       dir->fsend(BADjob);
-      free_pool_memory(sd_auth_key);
       return 0;
    }
-   jcr->sd_auth_key = bstrdup(sd_auth_key);
-   free_pool_memory(sd_auth_key);
-   Dmsg2(120, "JobId=%d Auth=%s\n", jcr->JobId, jcr->sd_auth_key);
+   Dmsg1(120, "JobId=%d\n", jcr->JobId);
    Mmsg(jcr->errmsg, "JobId=%d Job=%s", jcr->JobId, jcr->Job);
    new_plugins(jcr);                  /* instantiate plugins for this jcr */
    generate_plugin_event(jcr, bEventJobStart, (void *)jcr->errmsg);
@@ -1370,16 +1367,35 @@
 {
    int stored_port;                /* storage daemon port */
    int enable_ssl;                 /* enable ssl to sd */
+   POOLMEM *sd_auth_key;
    BSOCK *dir = jcr->dir_bsock;
    BSOCK *sd;                         /* storage daemon bsock */
 
+   /* We can be contacting multiple storage daemons.
+      So, make sure that any old jcr->store_bsock is cleaned up. */
+   if(jcr->store_bsock) {
+      jcr->store_bsock->destroy();
+      jcr->store_bsock = NULL;
+   }
+
    Dmsg1(100, "StorageCmd: %s", dir->msg);
-   if (sscanf(dir->msg, storaddr, &jcr->stored_addr, &stored_port, &enable_ssl) != 3) {
+   sd_auth_key = get_memory(dir->msglen);
+   if (sscanf(dir->msg, storaddr, &jcr->stored_addr, &stored_port, &enable_ssl, sd_auth_key) != 4) {
       pm_strcpy(jcr->errmsg, dir->msg);
       Jmsg(jcr, M_FATAL, 0, _("Bad storage command: %s"), jcr->errmsg);
+      free_pool_memory(sd_auth_key);
       goto bail_out;
    }
-   Dmsg3(110, "Open storage: %s:%d ssl=%d\n", jcr->stored_addr, stored_port, enable_ssl);
+
+   /* We can be contacting multiple storage daemons.
+      So, make sure that any old jcr->sd_auth_key is cleaned up. */
+   if(jcr->sd_auth_key) {
+      bfree(jcr->sd_auth_key);
+   }
+   jcr->sd_auth_key = bstrdup(sd_auth_key);
+   free_pool_memory(sd_auth_key);
+
+   Dmsg4(110, "Open storage: %s:%d ssl=%d %s\n", jcr->stored_addr, stored_port, enable_ssl, jcr->sd_auth_key);
    /* Open command communications with Storage daemon */
    /* Try to connect for 1 hour at 10 second intervals */
    sd = bnet_connect(jcr, 10, (int)me->SDConnectTimeout, me->heartbeat_interval,
@@ -1773,12 +1789,27 @@
 
 bail_out:
 
+   if (jcr->where) {
+      bfree(jcr->where);
+      jcr->where=NULL;
+   }
+
    if (jcr->JobErrors) {
       set_jcr_job_status(jcr, JS_ErrorTerminated);
+      Dmsg0(130, "Done in job.c\n");
+      return 0;
    }
 
    Dmsg0(130, "Done in job.c\n");
+
+   /* Tell the director that this storage is finished. Maybe it will give us
+      another. */
+   return dir->fsend(OKstoreend);
+}
+
+static int end_restore_cmd(JCR *jcr) {
    generate_plugin_event(jcr, bEventEndRestoreJob);
+
    return 0;                          /* return and terminate command loop */
 }
 
------------------------------------------------------------------------------
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to