On Thu, Mar 15, 2012 at 7:05 AM, Alan M. Carroll <
a...@network-geographics.com> wrote:

> Wednesday, March 14, 2012, 9:29:43 PM, John Plevyak wrote:
>
> > My view is that this is only one of many failure modes, albeit the most
> > common one.
>
> I disagree because only in the close case is the lock itself de-allocated.
> In all other cases the locks continue to be valid. So while all the other
> modes can be synchronized via locks, closing the NetVC cannot be. Before I
> started making fixes the crashes were almost all at the point of accessing
> the lock, not the NetVC itself.
>

The lock is reference counted.  It cannot be de-allocated while it is still
in use.  It is only de-allocated after the close() by which time all
references to that NetVC should have been dropped by the client.


>
> >   If the locking was working, then the client would clear all
> > pointers to the netvc
>
> I am still failing to see how, in my example timeline, the client in
> thread A can cause the client in thread B to drop its NetVC pointer, or
> even detect the fact that there is a pointer in thread B. Even if the
> operations are completely temporally disjoint (the point of locking) it
> will crash when thread B accesses the invalid lock or NetVC. No
> simultaneous access is required in the example scenario.
>
> The entire point of the reference counting in the patch is to provide that
> detection mechanism, so that thread A can in fact wait for the thread B
> client to drop its pointers. I don't see how that can be done with only
> locks if the locks themselves can become dangling pointers.
>

Each transaction should have one (1) lock.  When holding that lock all
pointers held by that transaction should be accessible.  Only the
transaction has the power to close() the NetVC.  Before doing so the
transaction must drop all references to the NetVC.  The  lock does need to
be reference counted, because all threads which might call the transaction
hold a pointer to the lock and an Action (lock + cancel boolean) and take
the lock and then check the "cancel" flag before calling the transaction.
 If the transaction holds the lock and cancels all outstanding operations
then it is free to release the lock and drop its outstanding reference to
the lock safe in the knowledge that it can't be activated by a stray
thread.  This is the procedure that make ATS rock solid from 1997 till some
bug was introduced.  Reference counting the NetVC is a bandaid for an
undiscovered bug.

Reply via email to