On Fri, Jul 01, 2022 at 09:59:11AM +0200, Claudio Jeker wrote:
> On Thu, Jun 30, 2022 at 03:46:35PM +0000, Visa Hankala wrote:
> > On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote:
> > > After discussing this with mpi@ and jmatthew@ we came to the conclusion
> > > that we need to smr_barrier() before refcnt_finalize() to ensure that no
> > > other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and
> > > smr_read_leave().
> > 
> > [...]
> > 
> > > @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock
> > >           return;
> > >   }
> > >  
> > > - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) {
> > > + smr_read_enter();
> > > + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) {
> > >           /*
> > >            * If route socket is bound to an address family only send
> > >            * messages that match the address family. Address family
> > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
> > >               rop->rop_proto != sa_family)
> > >                   continue;
> > >  
> > > -
> > > +         refcnt_take(&rop->rop_refcnt);
> > > +         smr_read_leave();
> > >           so = rop->rop_socket;
> > >           solock(so);
> > >  
> > > @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock
> > >           rtm_sendup(so, m);
> > >  next:
> > >           sounlock(so);
> > > +         smr_read_enter();
> > > +         refcnt_rele_wake(&rop->rop_refcnt);
> > 
> > This does not look correct.
> > 
> > smr_barrier() can proceed after smr_read_leave(), so refcnt_rele_wake()
> > might drop the final reference and this thread can no longer access
> > rop safely (SMR_TAILQ_NEXT() inside SMR_TAILQ_FOREACH()).
> > 
> > Also, SMR_TAILQ_NEXT() of rop becomes potentially dangling after
> > smr_read_leave(). After this thread leaves the read-side critical
> > section, another thread might free rop's successor.
> 
> So we need to either smr_barrier() before and after the refcnt_finalize()
> to make sure that the rop pointer remains stable in both cases or we alter
> the SMR_TAILQ_FOREACH() loop so that SMR_TAILQ_NEXT can be grabbed before
> refcnt_rele_wake().
> 
> While the double smr_barrier() is trivial it is not ideal and I think it
> is better to adjust the loop since SMR loops with sleep points is a
> somewhat common issue and so we should have a good clear way on how to
> solve it.

Adjusting SMR_TAILQ_FOREACH() will not help.

In general, a reader cannot resume a lockless iteration after it has
left the read-side critical section and crossed a sleep point. The
guarantee of consistent(-looking) forward linkage is gone. The reader
no longer knows if the value of SMR_TAILQ_NEXT() is valid. If the
reader wants to continue with the list, it has to re-enter the read-side
critical section and restart the iteration.


I guess I should finish the sleepable variant of SMR that I was
tinkering with long ago...

Reply via email to