On 4/25/2015 1:50 AM, Kern Sibbald wrote: > In my last email, I did forget to mention that as you point out, the > problem can also result from a design issue. And the resolution of > those problems from design issues fall into my point 2. If we have a > good test case that shows the problem, even if it results from a design > decision, most of the time we can find a solution -- in some cases, we > have added new directives, but in most cases, a bit more > programming/logic can fix the problem. > > One of the biggest issues that I have with the current SD algorithm is > that during the drive(s) reservation process (prior to starting the SD > job) once a write drive is assigned, it cannot be changed. Changing a > drive when multiple simultaneous jobs are writing is a non-trivial > problem. There are solutions, but they require rather profound changes > to the SD, which I have been planning for at least 5 years -- all the > underlying code and algorithms now exist so it is a matter of time.
Thank you Kern. That is good news! Have you considered using a single device-volume pair assignment, rather than both a device assignment and a separate volume assignment? I have found that the easiest way to avoid thread-related issues is to minimize the number of things that must be serialized. Since a job, at any given instant, will always require both a device and a volume, it might make sense to assign both at the same time as a single atomic operation. The device-volume pair assignment code can be serialized by a single mutex, and I believe that would greatly simplify the device and volume assignment code, as well as allow for changing a job's device in a safe manner. Any time that a job requires a volume to write on, whether at job start up or end of previous volume, it requests a device-volume pair to continue writing on. Since only one job at a time can enter the assignment code, both device and volume state are guaranteed to be static while checking device and volume criteria and making a device-volume pair selection and unloading / loading the device as needed. In turn, a successful request guarantees that the device-volume pair returned is valid for the job, and an unsuccessful request guarantees that the job needs to wait for an appendable volume. I believe that treating device and volume as a single unit would greatly simplify the assignment code. A single mutex for device-volume pairing should eliminate any chance of a race condition. ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users