>>>>> On Wed, 15 Dec 2010 10:15:09 +0100, Robert Wirth said: > > > > > On Tue, December 14, 2010 11:48 am, Robert Wirth wrote: > > > Hi, > > > > > > strange problem. Here's some hardware where Bacula has been running > > > successfully for ca. 5 years. It was release 1.38.11 under Solaris 10x86. > > > > > > Last month, we had a system disk crash on the backup system. No backup > > > datas have been lost. We just had to reinstall the backup system. > > > Since this was our only Solaris x86 system, we decided to migrate > > > to Linux and to a newer Bacula release. Until the repaired hardware > > > was present, we started with a virtualized new system, just for the > > > daily incremental backups to disk volumes. > > > > > > Since most of our actual systems are Ubuntu Hardy server LTS, we > > > choosed Bacula 2.2.8 of this distribution as our new version (well, > > > it's old, but 1.38.11 was running well, and 2.2.8 was the default) > > > > > > We upgraded Bacula's mysql database with the corresponding script > > > from 1.38.11 to 2.2.8. We imported the updated DB using mysql_dump > > > into the new system which has MySQL 5.1.41 and Linux Kernel 2.6.32 > > > The virtualized system worked well all the time. > > > > > > Now, the hardware version of the system is ready, and a yearly full > > > backup, which goes directly to tape, is imminent. > > > > > > And now, the strange things are coming... > > > > > > > > > /* The system is a 2x2 core AMD Opteron system, 4 GB RAM, 6xLSI SCSI U320 > > > Megaraid with seperated channels for external disks, tape readers and > > > autochanger. 23 TB disk storage on external RAIDs, autochanger and > > > HP-readers for LTO-3 tapes. System: see above. */ > > > > > > > > > NOW BACKING UP... > > > > > > Starting a bunch of full backup jobs which fit into 1 SINGLE TAPE > > > produces NO PROBLEMS: the jobs start, run and write, and terminate > > > within a usual span of time. In so doing, I can backup a dozen > > > systems with totally 360 GB on one tape in a few hours. > > > > > > > > > FACING THE PROBLEM... > > > > > > Starting a bunch of full backup jobs that DO NOT FIT into 1 single > > > tape proceeds like follows (with a fresh tape forced by setting the > > > former one to readonly): > > > > > > - first, the jobs run well and write their data to the first fresh tape > > > of the corresponding pool. Speed is similar as known from the old OS. > > > > > > - when the tape is full with around 600GB of data, it is marked as > > > Full, being unloaded, and the next free tape of the pool is loaded. > > > > > > - from this moment on, writing to the new fresh tape becomes incredibly > > > slow (4 GB/hour) and mysqld has constantly 95%-100% CPU load. > > > No other process has an important load, and the mysql load isn't > > > represented in the system's load values: > > > > > > Cpu(s): 3.3%us, 2.2%sy, 0.0%ni, 91.6%id, 2.1%wa, 0.1%hi, 0.7%si, > > > 0.0%st > > > Mem: 3961616k total, 3850072k used, 111544k free, 17532k buffers > > > Swap: 3906552k total, 0k used, 3906552k free, 3579956k cached > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > 1356 mysql 20 0 144m 31m 2376 S 98 0.8 163:57.79 mysqld > > > 1 root 20 0 2620 948 528 S 0 0.0 0:00.63 init > > > 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd > > > .... > > > > > > The only further effect I can see is that the table "bacula.JobMedia" is > > > growing. No errors in system log, no mysql errors, nor in Baculas log. > > > > > > What I mainly don't understand is why this happens after a tape change. > > > The MaxSpoolSize is 32GB, and I'm backing up 7 systems. Each of them > > > had several spool steps during the first tape. > > > > > >>From the view of Bacula and its program logic, what has changed when > > > the tape has been changed? I guess it's all the same: spooling data, > > > writing them to tape and update the catalog, regardless of first, second > > > or later tape...?!? > > > > What do you see under Running Jobs in the 'status dir' output before and > > after the first tape has filled? > > > > If you have only the 'after' just now, that might be interesting. > > Will try to see this on next try. > > /* Actual state: the second tape is loaded and used since yesterday > 02:14 p.m., and it really has been written 29 GB :-((( */ > > > And, I wrote wrong. There IS something strange in bacula log. Oooh. > > Yesterday, I started the 7 not-fitting big backup jobs around 9:45 a.m. > The first tape has been filled with 540 GB until 02:45 p.m. > > | 833 | 000090L1 | Full | 1 | 543,602,949,120 | 544 | > 63,072,000 | 0 | 1 | 1 | LTO-3 | 2010-12-14 14:45:52 | > > During this time, I could follow the spooling via messages on bconsole. > It all looked correct: parallel spooling and writing of data. > > /* PARENTHESIS: > Actual state: the second tape was loaded yesterday 02:45 p.m., > and it really has been written 29 GB since then during 18 hours :-((( > | 834 | 000098L1 | Append | 1 | 29,998,080,000 | 30 | > 63,072,000 | 0 | 4 | 1 | LTO-3 | 2010-12-15 08:14:46 | > END OF PARENTHESIS */ > > > In the log, there's only information about the job which I started > first (lnv-102). No word about the other 6 jobs and their spooling! > And the logfile is interrupted at 01:22 p.m., more than one hour before > the tape was full and changed: > > 14-Dec 09:44 bup-serv-dir JobId 93518: Start Backup JobId 93518, > Job=lnv-102.2010-12-14_09.44.00 > 14-Dec 09:46 bup-serv-dir JobId 93518: Using Device "Tape1Left" > 14-Dec 09:47 bup-serv-sd JobId 93518: Spooling data ... > 14-Dec 10:21 bup-serv-sd JobId 93518: User specified spool size reached. > 14-Dec 10:21 bup-serv-sd JobId 93518: Writing spooled data to Volume. > Despooling 32,000,031,721 bytes ... > 14-Dec 11:24 bup-serv-sd JobId 93518: Despooling elapsed time = 00:17:28, > Transfer rate = 30.53 M bytes/second > 14-Dec 11:24 bup-serv-sd JobId 93518: Spooling data again ... > 14-Dec 11:43 bup-serv-sd JobId 93518: User specified spool size reached. > 14-Dec 11:43 bup-serv-sd JobId 93518: Writing spooled data to Volume. > Despooling 32,000,032,040 bytes ... > 14-Dec 12:58 bup-serv-sd JobId 93518: Despooling elapsed time = 00:14:45, > Transfer rate = 36.15 M bytes/second > 14-Dec 12:58 bup-serv-sd JobId 93518: Spooling data again ... > 14-Dec 13:22 bup-serv-sd JobId 93518: User specified spool size > [END OF FILE] > > During this time, 17 spool sessions of all 7 systems, each 32 GB, have > been written to tape, not just 2 spools of one system, as logged here.
It looks like the log stream isn't being flushed to the file. Can you try the bconsole messages command instead? > "status dir" tells me that this one job is terminated, while the other > 6 jobs are still marked as running: > > 93518 Full lnv-102.2010-12-14_09.44.00 has terminated > > But, nevertheless, the last spool file (25GB) of this job is still > present in spool directory: > > -rw-r----- 1 root bacula 25590119686 2010-12-14 14:48 > bup-serv-sd.data.93518.lnv-102.2010-12-14_09.44.00.Tape1Left.spool > > And the spool metadata file has just 100MB: > > -rw-r----- 1 root bacula 103063552 2010-12-14 14:48 > bup-serv-sd.attr.lnv-102.2010-12-14_09.44.00.5.spool > > bacula-sd helds these two files open, as lsof shows: > > bacula-sd 2150 root 13u REG 251,5 25590119686 13 > /var/services/bacula/spool1/bup-serv-sd.data.93518.lnv-102.2010-12-14_09.44.00.Tape1Left.spool > bacula-sd 2150 root 14u REG 251,1 103063552 131160 > /var/services/bacula/work/bup-serv-sd.attr.lnv-102.2010-12-14_09.44.00.5.spool > > but no more connection to the FD of that system. This is ok, since all > data has been spooled. > For the 6 other jobs, connections AND spool files are present. Correct. > > > Spooling of lnv-102 is terminated, but the last spool isn't yet written > to tape. Then, t here should be NO HEAVY mysql operations yet, since no > job is completely despooled yet. Right? > > Where comes the mysql load from? Reminder that the bunch of tape-fitting > jobs wrote 360GB in a few hours, inclusively DB operations. Can you turn on mysql query logging? __Martin ------------------------------------------------------------------------------ Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d _______________________________________________ Bacula-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-users
