I'm not following you; The 'loaded' parameter is never changed in that function unless loaded < 0, which it isn't on entry. This is being called from autoload_device() (autoload.c:164-170) where the autoloader reports another slot than the one the volume is assumed to occupy. The comments in the function itself indicate that loaded may contain a value higher than 0.
I've ran this code for the last few days without experiencing this problem anymore. Plus I've ran it in debug mode 300 to validate that the logic is now doing the right thing too. I'll check again tomorrow to make sure but I am pretty confident I have fixed the problem I am seeing. On Sun, Jan 5, 2014 at 5:08 PM, Kern Sibbald <k...@sibbald.com> wrote: > Matthew, > > I am not sure how your suggested fix can actually do something > useful. The code just a bit higher in Bacula always sets the > slot to either -1 or 0, consequently, this test will always fail > if something was actually unloaded. > > What you say you want it to do seems to me to be correct, but > the code is, in my opinion, not at all doing what you write you > expect it to do. > > Best regards, > Kern > > On 01/03/2014 03:34 PM, Matthew Ife wrote: >> Sorry if this is not the right place to put this; I tried to submit >> this to mantis, but it did not respond with a confirmation email to >> setup my account. >> >> We recently upgraded to the latest community version of bacula but >> have seen some of our volumes landing in an error state on a daily >> basis. >> >> The following errors would occur; >> >> 2014-01-03 10Director JobId 1075931: Using Device "DD7" to write. >> 2014-01-03 10StorageDaemon JobId 1075945: 3305 Autochanger "load slot >> 2694, drive 0", status is OK. >> 2014-01-03 10Director JobId 1075947: Sending Accurate information. >> 2014-01-03 10StorageDaemon JobId 1075945: Warning: vol_mgr.c:464 Need >> volume from other drive, but swap not possible. Status: read=0 >> num_writers=0 num_reserve=1 swap=0 vol=FilersD-02694 from dev="DD7" >> (/srv/bacula/backup/0/staging-disk-diff/drive7) to "DD0" >> (/srv/bacula/backup/0/staging-disk-diff/drive0) >> 2014-01-03 10StorageDaemon JobId 1075945: Warning: Volume >> "FilersD-02694" not on device "DD0" >> (/srv/bacula/backup/0/staging-disk-diff/drive0). >> 2014-01-03 10StorageDaemon JobId 1075945: Marking Volume >> "FilersD-02694" in Error in Catalog. >> >> Further investigation demonstrated that the volume is definitely meant >> to be used for jobid 1075945. >> >> This issue occurs in the following conditions; >> >> * An autochanger is in use. >> * There are many requests from the storage daemon for the director >> to send the best 20 volumes to try. >> * The volume lock is contended. >> >> When a changer is in use, a device is selected from the changer and a >> volume is selected. >> The device and volume are inserted safely into the volume list. >> It is then the duty of the autochanger to then load the volume from >> the appropriate slot to the device. If the device already contains a >> volume, this is unloaded and we try to free the volume we are >> attempting to unload! >> >> However, this really just removes the acquired volume from the volume >> list that we are about to load in. Since the volume list is contended >> another thread will iterate to find a suitable volume. Sometimes it >> will select the volume that was released by the job that should really >> keep using it and (possibly) load it into its device that it has >> acquired. >> >> The originating thread expects to have its original volume available >> to it still. When it discovers later in the code that its volume is no >> longer sitting on its device, a swap is attempted -- but since another >> job now has acquired the volume it will fail to perform the swap, and >> mark the volume in error. >> >> The problem here is we assume that what slot the autochanger has >> loaded matches the slot for the requesting volume but since we have >> not yet loaded the volume into the autochanger this is a dangerous >> assumption. >> >> This situation is very often the case when bacula-sd is stopped and >> started. The autochanger script may maintain an independent state of >> which slots are loaded into which drives, which bacula-sd no longer >> has any state for. Thus on startup many of the devices in the script >> autochangers' state bacula-sd will have no historical knowledge of how >> they got there. >> >> Note that performing an incorrect swap is only one outcome of this >> problem (and the most obvious problem), it will depend on what is >> racing, who wins the race and what the race-winner intends to do. >> >> I have provided a small patch, which changes the autochanger >> behaviour. It will only free a volume on unload when the volume being >> unloaded in the autochanger matches the volume bacula expects to be in >> the autochanger. >> >> The patch also covers the zeroeth slot since the code already makes a >> check for that further up. >> >> diff -ur bacula-5.2.13/src/stored/autochanger.c >> bacula-5.2.13-new/src/stored/autochanger.c >> --- bacula-5.2.13/src/stored/autochanger.c 2013-02-19 19:21:35.000000000 >> +0000 >> +++ bacula-5.2.13-new/src/stored/autochanger.c 2014-01-03 >> 13:36:04.380454536 +0000 >> @@ -397,8 +397,11 @@ >> } >> unlock_changer(dcr); >> >> - if (loaded > 0) { /* free_volume outside from changer lock */ >> - free_volume(dev); /* Free any volume associated with this >> drive */ >> + if (dev->get_slot() == loaded) { >> + /* free_volume outside from changer lock */ >> + /* avoid freeing volume when the autochanger slot differs */ >> + /* from the running vol list. */ >> + free_volume(dev); >> } >> >> if (ok) { >> >> ------------------------------------------------------------------------------ >> Rapidly troubleshoot problems before they affect your business. Most IT >> organizations don't have a clear picture of how application performance >> affects their revenue. With AppDynamics, you get 100% visibility into your >> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics >> Pro! >> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk >> _______________________________________________ >> Bacula-devel mailing list >> Bacula-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/bacula-devel >> > ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel