On Wednesday 09 December 2009 18:51:27 Jesper Krogh wrote: > Kern Sibbald wrote: > > On Tuesday 08 December 2009 20:21:18 Jesper Krogh wrote: > >> Eric Bollengier wrote: > >>> Hello, > >>> > >>> Maybe it could be possible to make the restore size information > >>> available to the ClientBeforeJob runscript, then it will be possible to > >>> run a specific script that can do whatever you want. IMHO, Bacula won't > >>> implement directly this kind of feature. > >> > >> I'll accept that is it not high on the priority list (not likely to be > >> implemented). > >> > >> But the lack of this functionality combined with 2.5 hours for "starting > >> the job" and a default for getting on with the job set to yes, just > >> caused us 2 hours of down-time on a production system the other day. > > > > Sorry to hear that. > > > > I suspect that you need some tuning of your Bacula server and catalog > > server. The times you cited for the volume seem to me to indicate an > > underpowered catalog server ... > > That was also my "first shot", but bacula-dir does hang in 100% cpu load > during the time, with no catalog activity at all (measured using strace > on the director-thread and mysqladmin processlist in the db).
The only way to move forward is for you to trap it at the place where it is looping. After that, it is generally a matter of turning or more often adding debug code until the problem is found. > > Building the initial interactive tree takes less then 40s. for 4m files, > which is not that bad in my eyes. The database is running on a 8 core, > 48GB memory machine, so the hardware is not really underpowered. (and > since I see no database activity I have problems shooting at mysql). OK > > I dont have any solid evidende, but I even tend to belive that I have > tested this on an earlier version of bacula and that didn't have the > problem. I hear that quite often. It is more likely that you have hit some "corner case" since I don't think we have changed any of that code in *many* versions. Once we know where it is looping, we will know within a few minutes what was changed if anything. > > I just did another test. I have other volumes with many files, so now I > did the steps for another volume marked 10m files, it took less than 1 > minute to do the same process. > > So the difference must be where? (in the data structure, in the database?) Most likely the data has been corrupted or the algorithm has a flaw that does not trigger very often. > > Jesper ------------------------------------------------------------------------------ Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
