Clearly it should never be the case that the NetVC is deallocated while
still 'in use'.

NetHandler::mainNetEvent doesn't lock the VCs because it doesn't need to
until it wants to access the internal data at which time it locks it.
 Therefore the internal data is protected by the NetVC mutex.  However, the
existance is NOT protected by that mutex.

The NetHandler lock is used to protect the allocation, deletion of the
NetVC.  The owner of the NetVC lock can set the 'closed' flag, but can't
delete the NetVC unless they own the NetHandler lock.   A well functioning
protocol engine would hold the NetVC mutex,. remove all references to the
NetVC  then call do_io_close().

It looks like there are crashes because someone is calling do_io_close
without ensuring that all the holders of pointers to the NetVC have been
cleared.  Once do_io_close() is called, the NetVC is dead, it can't be
accessed any more because the NetHandler can access the "closed" flag
WITHOUT the NetVC lock, but WITH the NetHandler lock and delete the NetVC.

There are a variety of ways of throwing smart pointers at the "problem"
which basically confuse when the object is actually live and let people
continue to hold pointers to dead/closed objects in buggy code.

This doesn't change the fact that if they made sure not to hold stale
pointers, there wouldn't be a problem in the first place.   The only way a
NetVC is deleted is when it is closed, which is done by a protocol engine
which should be in charge of all pointers to the NetVC and should be
managing them.

Nobody should be holding a pointer to a NetVC after do_io_close() is
called.  That is the core problem.

john

On Fri, Dec 9, 2011 at 12:26 PM, Alan M. Carroll <
a...@network-geographics.com> wrote:

> The core problem is that VC connection locking does not work. VCs are
> deallocated while still in use which leads to crashes. In fact, the general
> handling of VCs in NetHandler::mainNetEvent doesn't lock the VCs at all, so
> the cross thread access can't be safe. Moreover, because all the VC
> references are stored as raw pointers, there's no good way to safely access
> them across threads or even detect deallocation. I tried putting in
> generation numbers but that was insufficient (you can still get crashes
> because the virtual function pointer has been swizzled by the deallocator
> before you even get to the VC). All the crashes in the listed bugs stem
> from VC locking failures leading to cross thread corruption.
>
> I've been at quite a loss on how to fix this.
>
> The two root causes, IMHO, are
>
> 1) VCs are ephemeral and stored purely by pointer, which means the
> accessor cannot have thread safe access. You can use pointers if your
> objects don't evaporate asynchronously. The use of raw pointers also means
> there's no way to detect dangling references.
>
> 2) Closing and deallocation are conflated, so that when a VC needs to be
> closed, it is also de-allocated. There are good reasons to get the close
> done ASAP but the de-allocation is not nearly a time critical.
>
> I am looking at using smart pointers to alleviate these two issues.
> However, that creates problems because the current queue mechanisms require
> the use of raw pointers. AFAICT the point of these is to minimize
> allocation on the fast path. The circular buffer is an attempt to have as
> little allocation impact as possible, otherwise I would just use a standard
> allocate per element style.
>
> Experimentally, the fix I did for TS-934 had a strong effect on reducing
> the problem, although it was not a complete solution. My attempt to expand
> it in TS-1031 did not end well, due to the lack of locks from NetHandler.
> At that point I was looking at a significant change, so I decided that I
> should try to do the best thing, rather than a minimalist hack which would
> not really be that much smaller a change.
>
> Friday, December 9, 2011, 11:38:47 AM, you wrote:
>
> > Could you be more specific about the problem (e.g. big numbers)?
> >  VConnection locking works fine in all cases that I know of.  There are
> > some issues with closing and deallocation with threading and the higher
> > layers, but these could be solved in a number of ways.  Smart pointers is
> > one (and probably not a bad one) although it isn't a panacea, just a
> > mechanism.  Similarly, a circular buffer queue is just a different way of
> > viewing the data structure.  Sure, it has some advantages, particularly
> in
> > exposure to deallocation and locking, but it isn't a silver bullet
> either.
> >  I'd like to understand the core problem.
>
>

Reply via email to