I really should have looked more closely at what their posts as well. It took a long time for me to remember how lock switching was supposed to work. I think it is far too brittle and clearly it was poorly documented. I am going to add some inline documentation and when I can break some time away I am going to write up some notes on what might be done to make this less brittle and easier to maintain going forward. I would like to move to a more fine-grained "modern" locking model with clearly defined lock scopes and critical regions. Something that contributors with a background in more modern parallel programming can feel comfortable with.
john On Mon, Mar 26, 2012 at 9:54 AM, Alan M. Carroll < a...@network-geographics.com> wrote: > Looks good. I apologize to weijin and ming_zym for being too distracted to > look closely at those even though I meant to do so. Not being able to > replicate the problem even on old code bases made it problematic as well. > > Sunday, March 25, 2012, 5:56:01 PM, you wrote: > > > I found the problem and it has nothing to do with any of this. The > > problem, as quite rightly pointed out by weijin is that when the closed > > flag by one thread, another thread can delete the NetVC. This is > expected > > and desirable, however, unfortunately do_io_close() is accessing the "nh" > > variable (which happens to be in the NetVC), and it is dereferencing it > to > > get the mutex to check to see if it is running in the same thread as the > > NetVC and it is doing so after the closed flag is set. > > > The simple solution is to move the read of nh->mutex->thread_holding > above > > the setting of "closed". > > > Please see patch TS-857-jp1.patch attached to TS-857. > >