Hello, I am sorry, but this email is not something we cannot act on because:
1. The background is snipped, and the problem is not described sufficiently (see www.bacula.org -> Bugs for information; e.g. we don't even know what version of Bacula you are using). 2. Unfortunately your patch will not help you. It is a NOOP -- i.e. it doesn't change anything since at the points you are testing NumConcurrentJobs is *guaranteed* to be non-zero (at least in the current code base). Best regards, Kern On Sunday 09 March 2008 01.34:06 Peter Much wrote: > <[EMAIL PROTECTED]> aka Arno Lehmann schrieb > > mit Datum Sun, 02 Mar 2008 12:50:17 +0100 in m2n.bacula.devel: > |> 2. This is the thing that I have been worrying the most about. I > |> have been following various theories about what might happen > |> there, yet to no avail. The last of my theories was that it might > |> have to do with the migrations, but currently I tend to dismiss > |> this theory also. In fact, I am still clueless. > |> What happens is that the Director puts all jobs (and all newly > |> started jobs) into either "waiting on max Storage jobs" or > |> "waiting execution", while there is no job running on any client > |> and no job running on the SD. It just does nothing and has to > |> be restarted. > | > |That definitely qualifies as a bug... have you tried looking at the > |debug output, once the DIR is in this state? > > This was a good hint. The debug shows this: > >BxDir: jcr.c:603-0 OnEntry JobStatus=s set=s > >BxDir: jcr.c:623-0 OnExit JobStatus=s set=s > >BxDir: jobq.c:701-0 Wstore=Files > >BxDir: jobq.c:723-0 Fail wncj=-2 > > And what I also have seen is rncj=-2, and rncj=3. > > Looking into jobq.c, I find that rncj is never supposed to take any > value except 0 and 1 (maximum one read job per device). > OTOH, I find that rncj is not a unique entity - it is just the > NumConcurrentJobs of any Storage device. > > So, this seems not to be a migration issue, it seems to be a problem > with multidrive autoloaders. > According to the manual, since Bacula version 1.whatever an > autoloader has to be defined as a single device in the DIR. > So, if this autoloader has multiple drives, it is well possible > that these drives are used for reading AND writing at the same time. > > And this seems to break the rncj/wncj logic. My current most likely > interpretation runs that way: Suppose we have one restore running: > rncj=1. Then we get two backups running: wncj=rncj=3. Then the > restore terminates and sets rncj=0. So, when the two backup > jobs terminate, it goes to -2 - and this is where the show ends. > > I am now trying the following as a fix, and see if it helps. > > rgds, > PMc > > --- src/dird/jobq.c.orig Mon Dec 10 18:54:41 2007 > +++ src/dird/jobq.c Sun Mar 9 00:27:02 2008 > @@ -478,7 +478,8 @@ > */ > if (jcr->acquired_resource_locks) { > if (jcr->rstore) { > - jcr->rstore->NumConcurrentJobs = 0; > + if (jcr->rstore->NumConcurrentJobs > 0) > + jcr->rstore->NumConcurrentJobs--; > Dmsg1(200, "Dec rncj=%d\n", > jcr->rstore->NumConcurrentJobs); } > if (jcr->wstore) { > @@ -738,7 +739,8 @@ > Dmsg1(200, "Dec wncj=%d\n", jcr->wstore->NumConcurrentJobs); > } > if (jcr->rstore) { > - jcr->rstore->NumConcurrentJobs = 0; > + if(jcr->rstore->NumConcurrentJobs > 0); > + jcr->rstore->NumConcurrentJobs--; > Dmsg1(200, "Dec rncj=%d\n", jcr->rstore->NumConcurrentJobs); > } > set_jcr_job_status(jcr, JS_WaitClientRes); > @@ -753,7 +755,8 @@ > Dmsg1(200, "Dec wncj=%d\n", jcr->wstore->NumConcurrentJobs); > } > if (jcr->rstore) { > - jcr->rstore->NumConcurrentJobs = 0; > + if(jcr->rstore->NumConcurrentJobs > 0); > + jcr->rstore->NumConcurrentJobs--; > Dmsg1(200, "Dec rncj=%d\n", jcr->rstore->NumConcurrentJobs); > } > jcr->client->NumConcurrentJobs--; > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Bacula-devel mailing list > Bacula-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-devel ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel