On Mon, Mar 14, 2022 at 04:14:47PM +0000, Visa Hankala wrote:
> On Mon, Mar 14, 2022 at 02:01:07AM -0700, Philip Guenther wrote:
> > On Mon, Mar 14, 2022 at 12:47 AM Visa Hankala <v...@hankala.org> wrote:
> > 
> > > On Sun, Mar 13, 2022 at 06:26:19PM -0700, Philip Guenther wrote:
> > > > On Sun, Mar 13, 2022 at 10:27 AM Visa Hankala <v...@hankala.org> wrote:
> > > >
> > > > > On Sun, Mar 13, 2022 at 04:29:44PM +0100, Mark Kettenis wrote:
> > > > >
> > > > ...
> > > >
> > > > > > Under what circumstances does memory ordering matter for these
> > > > > > interfaces?
> > > > >
> > > > > Consider the following scenario:
> > > > >
> > > > > struct widget {
> > > > >         struct refcnt   w_refcnt;
> > > > >         /* more fields spanning many cache lines */
> > > > >         ...
> > > > >         int             w_var;
> > > > > };
> > > > >
> > > > > First, CPU 1 executes:
> > > > >
> > > > >         w->w_var = 1;
> > > > >         refcnt_rele(&w->w_refcnt);      /* remains above zero */
> > > > >
> > > >
> > > > Having incremented the refcnt previous does not give this thread
> > > exclusive
> > > > access to 'w', so if it's writing to w->w_var then it must either
> > > > a) have some sort of write lock taken, which it will release after this
> > > and
> > > > which will contain the necessary member, OR
> > > > b) have the only access patch to this structure (i.e, it's not yet
> > > > 'published' into structures which can be seen by other threads), in 
> > > > which
> > > > case the operations which do that 'publishing' of the access to 'w'
> > > (adding
> > > > it to a global list, etc) must include the necessary membar.
> > >
> > > Lets change the sequence to this:
> > >
> > >         local_var = atomic_load_int(&w->w_var);
> > >         refcnt_rele(&w->w_refcnt);
> > >
> > > Without the release barrier, is the load guaranteed to happen before
> > > the reference count is decremented?
> > >
> > 
> > That's completely uncomparable to what you described before for "CPU 1"
> > because there's no write to anything but the refcnt.
> > 
> > If no one writes to the object that is ref counted, then there are no
> > visibility problems, no?
> > 
> > 
> > > Next, CPU 2 executes:
> > > > >
> > > > >         if (refcnt_rele(&w->w_refcnt))  /* refcnt drops to zero */
> > > > >                 free(w);
> > > > >
> > > >
> > > > How did CPU 2 get what is now exclusive access to 'w' without any
> > > membars?
> > > > If that's possible then it was just accessing 'w' and possibly not 
> > > > seeing
> > > > the update to w->w_var even _before_ the refcnt_rele(), so putting a
> > > membar
> > > > in refcnt_rele() hides the incorrect code by suppressing the later 
> > > > crash!
> > > >
> > > > If these membars appear to help then the code is and remains broken.
> > > This
> > > > change should not be done.
> > >
> > > It is not uncommon to see something like below:
> > >
> > > Access object with read intent:
> > >
> > >         mtx_enter(&w_lock);
> > >         w = lookup_from_list();
> > >         if (w != NULL)
> > >                 refcnt_take(&w->w_refcnt);
> > >         mtx_leave(&w_lock);
> > >         if (w == NULL)
> > >                 return;
> > >         ...
> > >
> > 
> > No writes to *w described *OR LEGAL*.
> > 
> > 
> > 
> > >         if (refcnt_rele(&w->w_refcnt))
> > >                 free(w);
> > 
> > Delete object:
> > >
> > >         mtx_enter(&w_lock);
> > >         w = lookup_from_list();
> > >         if (w != NULL)
> > >                 remove_from_list(w);
> > >         mtx_leave(&w_lock);
> > >
> > 
> > This does the 'release' on the change to w's list link(s).
> > 
> > 
> > 
> > >         /* Release list's reference. */
> > >         if (w != NULL && refcnt_rele(&w->w_refcnt))
> > >                 free(w);
> > >
> > > Above, any refcnt_rele() can release the final reference.
> > >
> > > If there is no acquire barrier after the refcnt 1->0 transition, what
> > > is known about the CPU's local view of the object after refcnt_rele()?
> > >
> > 
> > Okay, if refcnt is used with objects that have embedded list links, or
> > could be, then refcnt_rele needs acquire semantics on the 1->0 transition.
> > I can agree with your justification for that.
> > 
> > Do you have a similar example for giving it release semantics as your diff
> > proposed?  What's the otherwise-unprotected-by-synchronization-primitive
> > that has to be protected?
> 
> The release barrier prevents memory accesses from getting reordered
> across the decrement. Combined with the acquire barrier, the 1->0
> transition is a clean cutoff in activity.
> 
> The release barrier helps preserve "happens before" relations. What
> CPUs see just before the decrement will also be visible when
> refcnt_rele() returns non-zero.
> 
> I have not come up with a non-artificial example that could not be
> tackled with careful use of memory barriers or locking prior to
> refcnt_rele(). If the function did not provide the release barrier,
> the need for explicit synchronization might be surprising though.
> 
> Thread A:
> 
>       mtx_enter(&w_lock);
>       w = lookup_from_list();
>       if (w != NULL)
>               refcnt_take(&w->w_refcnt);
>       mtx_leave(&w_lock);
>       if (w == NULL)
>               return;
> 
>       /* w_var is only accessed by thread A */
>       val = w->w_var;
>       computation(&val);
>       w->w_var = val;
> 
>       if (refcnt_rele(&w->w_refcnt))
>               free(w);
> 
> Thread B:
> 
>       mtx_enter(&w_lock);
>       w = lookup_from_list();
>       if (w != NULL)
>               remove_from_list(w);
>       mtx_leave(&w_lock);
>       /* Release list's reference. */
>       if (w != NULL && refcnt_rele(&w->w_refcnt))
>               free(w);
> 
> Above, thread A would need a barrier before the release. Otherwise
> the write might happen too late. This would be far from obvious if
> the code was more complicated.

Below is another, less far-fetched, example that shows that the current
state of affairs is not safe:

        mtx_enter(&w->w_mtx);
        ...
        mtx_leave(&w->w_mtx);
        if (refcnt_rele(&w->w_refcnt))
                free(w);

The mtx_leave() introduces a memory release barrier that forces all
preceding memory accesses to complete before the decrement. However,
there is no memory barrier after the write that releases the mutex.

SPL adjusting aside, mtx_leave() essentially does this:

        membar_exit();
        mtx->mtx_owner = NULL;

Now, because of inadequate memory synchronization, the writing
of mtx_owner can become visible to other CPUs after the decrement.

I think the sanest solution is to add the release and acquire barriers
in refcnt_rele().

Reply via email to