Greetings..
I've been having a concurrency problem with my fully disk-based autochanger
setup, which is resulting in volumes being marked as Error status erroneously.
As near as I can tell, the situation which leads up to this is:
1: the volume that is decided upon is already loaded in a different drive than
has been assigned to the job
2: multiple jobs are running competing for volumes within the same pool
3: Maximum Concurrent Jobs is set to 1 on each Device entry
In digging through the code, it looks like vol_mgr.c has some special case code
to deal with the case where a volume is being swapped from one drive to the
other. However, it looks to me like the code that allows all that to happen
(namely, keeping the volume in the vol_list even after volume_unused is called,
so that the code in reserve_volume can detect that the volume is already loaded
in a drive in the changer) is specifically disabled for disk devices:
vol_mgr.c in volume_unused at around line # 587:
/*
* If this is a tape, we do not free the volume, rather we wait
* until the autoloader unloads it, or until another tape is
* explicitly read in this drive. This allows the SD to remember
* where the tapes are or last were.
*/
Dmsg4(dbglvl, "=== set not reserved vol=%s num_writers=%d dev_reserved=%d
dev=%s\n",
dev->vol->vol_name, dev->num_writers, dev->num_reserved(),
dev->print_name());
Dmsg1(dbglvl, "=== clear in_use vol=%s\n", dev->vol->vol_name);
dev->vol->clear_in_use();
if (dev->is_tape() || dev->is_autochanger()) {
return true;
} else {
/*
* Note, this frees the volume reservation entry, but the
* file descriptor remains open with the OS.
*/
return free_volume(dev);
}
It seems to me (and my admittedly extremely shallow understanding of the code
so far) that either this code should be changed to treat disk volumes the same,
OR code should be added to unload a disk based autochanger drive when the
volume is done being used.
Please forgive me if I'm way off base.. just trying to track down the cause of
my problem :( A (rather length) trace output of a series of jobs which
illustrates the problem I'm having can be found at:
http://pastebin.com/raw.php?i=Ne0XJbBh
Any thoughts or ideas on where to keep looking?
Cheers,
Joe
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel