On Tuesday 19 May 2009 14:12:13 James Harper wrote:
> > > Kern: I can commit this if it's okay with you?
> >
> > Thanks for finding this, but please do not commit the above patch at
>
> least for
>
> > the moment.  The problem is indeed a race condition as you found.
>
> Your fix
>
> > does significantly reduce the window for it to happen but does not
>
> eliminate
>
> > it.
> >
> > The correct solution is to put the jq mutex lock/unlock (i.e. P/V)
>
> around the
>
> > test for max_workers at line 357 of jobq.c and to move the
>
> incrementing of
>
> > the count as you have done.   However, it is even more complicated
>
> because
>
> > you only removed one of the many decrements of num_workers from
>
> jobq_server,
>
> > and it is not so easy to remove them all.  In addition for the
>
> jobq_server to
>
> > work correctly, it must be able to release and reset the lock.  Some
>
> careful
>
> > thought needs to go into this.  Possibly the solution is to just pull
>
> out the
>
> > main P() and V()s from the jobq_server routine on entry and exit but
>
> to leave
>
> > the 3 places in the code where the lock is temporarily released.
> >
> > Unless I hear otherwise from you, I will assume that you want to
>
> create the
>
> > patch.
>
> The calls to start_server are already protected by P() and V() so
> everything in start_server is fine.

The problem is in the test (line 357), which is not protected by a mutex.  
Therefore your fix does not definitively resolve the problem.

>
> I moved the increment from the thread to start_server to make the 'test
> then increment' an atomic operation, which is where the race is
> occurring. The decrement there is only to reverse the increment, and is
> again atomic.
>
> All other decrements are already protected by P() and V() as far as I
> can see.

If you are referring to your patch and the existing code in the above two 
paragraphs, then I already examined what they do and have concluded what I 
wrote above.   In my opinion, now that we are aware of the problem we should 
fix it correctly, and as long as the test at line 357 is not protected by a 
mutex, then the race condition is not definitively fixed.



Kern

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to