On Tue, Jun 12, 2018 at 11:50:46AM +0300, jackm wrote: > On Fri, 8 Jun 2018 10:42:18 -0700 > Matthew Wilcox <wi...@infradead.org> wrote: > > > + rcu_read_lock(); > > + mad_agent = idr_find(&ib_mad_clients, hi_tid); > > + if (mad_agent > > && !atomic_inc_not_zero(&mad_agent->refcount)) > > + mad_agent = NULL; > > + rcu_read_unlock(); > > Hi Matthew, > > I don't see the flow which can explain using atomic_inc_not_zero() here. > > The refcount will go to zero only when unregister_mad_agent() is > called (code below, see asterisks): > idr_lock(&ib_mad_clients); > *** idr_remove(&ib_mad_clients, mad_agent_priv->agent.hi_tid); > idr_unlock(&ib_mad_clients); > > flush_workqueue(port_priv->wq); > ib_cancel_rmpp_recvs(mad_agent_priv); > > *** deref_mad_agent(mad_agent_priv); > [JPM] The call to idr_find in the interrupt context > would need to occur here for the refcount to have a > possibility of being zero. > Shouldn't idr_find in the interrupt context fail, since > idr_remove has already been invoked?
RCU is tricky. Here's the flow: CPU 0 CPU 1 rcu_read_lock(); mad_agent = idr_find(&ib_mad_clients, hi_tid); idr_lock(&ib_mad_clients); idr_remove(&ib_mad_clients, mad_agent_priv->agent.hi_tid); idr_unlock(&ib_mad_clients); flush_workqueue(port_priv->wq); ib_cancel_rmpp_recvs(mad_agent_priv); deref_mad_agent(mad_agent_priv); Now, you're going to argue that CPU 0 is running in interrupt context, but with virtualisation, it can have the CPU taken away from it at any time. This window which looks like a couple of instructions long can actually be seconds long. > wait_for_completion(&mad_agent_priv->comp); > > The refcount will be able to go to zero only after deref_mad_agent is > called above. Before this, however, idr_remove() has been called -- > so, if my understanding is correct, the idr_find call in > find_mad_agent() should not succeed since the refcount can get to zero > only AFTER the idr_remove call. > > Could you please explain the flow which can result in idr_find > succeeding (in the interrupt context) after idr_remove has been invoked > (in the process context)? Will idr_find succeed even after > idr_remove, and only fail after kfree_rcu is invoked as well? (or, > maybe after some garbage-collection delay?) Ordering is weird in SMP systems. You can appear to have causality violations when you're operating locklessly (and rcu_read_lock() is essentially lockless). So we can absolutely observe the store to agent->refcount before we observe the store to idr->agent. Documentation/memory-barriers.txt has a LOT more information on this.