Any reason for not updating to v7 Bacula? It contains a number of fixes as well as new features. The version that you are running is nearly 2 years old, although there were a few bug fixes along the way – however no updates since April 2014.
Patti Clark Linux System Administrator R&D Systems Support Oak Ridge National Laboratory From: Robert Heinzmann <[email protected]<mailto:[email protected]>> Date: Tuesday, March 3, 2015 at 3:37 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: [Bacula-users] Bacula SD 5.2.13 crash - Mutex lock failure. ERR=Invalid argument Hello, we are using Bacula 5.2.13-18 on CentOS6 and from time to time bacula-sd crashes with, causing all backups to fail until bacula-sd is started again: Mar 3 06:59:00 XXXX bacula-sd: XXXX:storage:default: ABORTING due to ERROR in lockmgr.c:100#012Mutex lock failure. ERR=Invalid argument Mar 3 06:59:00 XXXX bacula-sd: Bacula interrupted by signal 6: IOT trap Setup: 3 Servers: 1 Bacula Director (extra machine) 1 Bacula Catalog Server (extra machine) 1 Bacula Storage Deamon (extra machine) We have ~573 Jobs (some TB, all Full Backups) to backup each day. Jobs are distributed across the day depending on minimum load of the server, distributed evenly otherwise: Time Jobs 0:00-1:00 35 1:00-2:00 121 2:00-3:00 93 3:00-4:00 60 4:00-5:00 46 5:00-6:00 71 6:00-7:00 60 7:00-8:00 43 8:00-9:00 32 9:00-10:00 12 10:00-11:00 7 11:00-12:00 3 12:00-13:00 5 13:00-14:00 2 14:00-15:00 7 15:00-16:00 8 16:00-17:00 7 17:00-18:00 3 18:00-19:00 2 19:00-20:00 3 20:00-21:00 11 21:00-22:00 14 22:00-23:00 28 23:00-24:00 25 Our SD is configured with 20 virtual drives in a backup2disk setup allowing 20 concurrent backups to disk. Each Backup Job is an individual file in the backend (so full backups can be accessed and restored through bls/bextract). We have an external “scripted” job, which cleans up unused / purged volumes from disk. Bacula Director Configuration: ------------------------------ Storage { Name = "XXXX:storage:default" Address = HOSTNAME_OF_THE_SD_MACHINE Password = "SECRET" Device = "FileStorage" Maximum Concurrent Jobs = 20 Media Type = File Heartbeat Interval = 15 TLS Enable = no } Pool { Name = " HOSTNAME_OF_THE_SD_MACHINE:pool:default" Storage = "XXXX:storage:default" # All Volumes will have the format standard.date.time to ensure they # are kept unique throughout the operation and also aid quick analysis # We won't use a counter format for this at the moment. Label Format = "BACULA-${Job}.${Year}${Month:p/2/0/r}${Day:p/2/0/r}.${Hour:p/2/0/r}${Minute:p/2/0/r}.${JobId}" Pool Type = Backup # Clean up any we don't need, and keep them for a maximum of a month (in # theory the same time period for weekly backups from the clients) # Note the files for the old volumes will still remain on the disk but will # be truncated to a zero size. Recycle = No Auto Prune = Yes Action On Purge = Truncate Volume Retention = 30 days # Don't allow re-use of volumes; one volume per job only Maximum Volume Jobs = 1 } Bacula SD Configuration: ------------------------------ Autochanger { Name = "FileStorage" Changer Device = /dev/null Changer Command = "" Device = FileStorage-sd-0 Device = FileStorage-sd-1 Device = FileStorage-sd-2 Device = FileStorage-sd-3 Device = FileStorage-sd-4 Device = FileStorage-sd-5 Device = FileStorage-sd-6 Device = FileStorage-sd-7 Device = FileStorage-sd-8 Device = FileStorage-sd-9 Device = FileStorage-sd-10 Device = FileStorage-sd-11 Device = FileStorage-sd-12 Device = FileStorage-sd-13 Device = FileStorage-sd-14 Device = FileStorage-sd-15 Device = FileStorage-sd-16 Device = FileStorage-sd-17 Device = FileStorage-sd-18 Device = FileStorage-sd-19 Device = FileStorage-sd-20 } Autochanger { Name = "FileStorage-restore" Changer Device = /dev/null Changer Command = "" Device = FileStorage-sd-restore-0 Device = FileStorage-sd-restore-1 Device = FileStorage-sd-restore-2 Device = FileStorage-sd-restore-3 Device = FileStorage-sd-restore-4 Device = FileStorage-sd-restore-5 Device = FileStorage-sd-restore-6 Device = FileStorage-sd-restore-7 Device = FileStorage-sd-restore-8 Device = FileStorage-sd-restore-9 Device = FileStorage-sd-restore-10 Device = FileStorage-sd-restore-11 Device = FileStorage-sd-restore-12 Device = FileStorage-sd-restore-13 Device = FileStorage-sd-restore-14 Device = FileStorage-sd-restore-15 Device = FileStorage-sd-restore-16 Device = FileStorage-sd-restore-17 Device = FileStorage-sd-restore-18 Device = FileStorage-sd-restore-19 Device = FileStorage-sd-restore-20 } Backup Drives like this: Device { Name = FileStorage-sd-0 # Add a hyphen to SD/autochanger name & match with drive index Device Type = File Media Type = File #unique to each archive device path, different path, different mediatype Archive Device = /bacula/data01 AutomaticMount = yes AlwaysOpen = yes RemovableMedia = yes Autochanger = yes Drive Index = 0 Maximum Concurrent Jobs = 1 Volume Poll Interval = 5 LabelMedia = yes Spool Directory = /bacula/spool01 Autoselect = yes Maximum Network Buffer Size = 65536 } … 18 more… Device { Name = FileStorage-sd-20 # Add a hyphen to SD/autochanger name & match with drive index Device Type = File Media Type = File #unique to each archive device path, different path, different mediatype Archive Device = /bacula/data01 AutomaticMount = yes AlwaysOpen = yes RemovableMedia = yes Autochanger = yes Drive Index = 20 Maximum Concurrent Jobs = 1 Volume Poll Interval = 5 LabelMedia = yes Spool Directory = /bacula/spool01 Autoselect = yes Maximum Network Buffer Size = 65536 } Restore Drives like this: Device { Name = FileStorage-sd-restore-0 # Add a hyphen to SD/autochanger name & match with drive index Device Type = File Media Type = File #unique to each archive device path, different path, different mediatype Archive Device = /bacula/data01 AutomaticMount = yes AlwaysOpen = yes RemovableMedia = yes Autochanger = yes Drive Index = 0 Maximum Concurrent Jobs = 1 Volume Poll Interval = 5 LabelMedia = yes Spool Directory = /bacula/spool01 Autoselect = no Maximum Network Buffer Size = 65536 } Any idea what’s causing the bacula-sd crash ? how can be debug further ? Regards, Robert ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Bacula-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-users
