On Fri, Jan 11, 2019 at 07:55:03AM -0800, Bart Van Assche wrote: > On Fri, 2019-01-11 at 13:48 +0100, Peter Zijlstra wrote: > > I spotted this new v6 in my inbox and have rebased to it. > > Thanks! > > > On Wed, Jan 09, 2019 at 01:01:48PM -0800, Bart Van Assche wrote: > > > > > The changes compared to v5 are: > > > - Modified zap_class() such that it doesn't try to free a list entry that > > > is already being freed. > > > > I however have a question on this; this seems wrong. Once a list entry > > is enqueued it should not be reachable anymore. If we can reach an entry > > after call_rcu() happened, we've got a problem. > > Apparently I confused you - sorry that I was not more clear. What I meant is > that I changed a single if test into a loop. The graph lock is held while that > loop is being executed so the code below is serialized against the code called > from inside the RCU callback: > > @@ -4574,8 +4563,9 @@ static void zap_class(struct pending_free *pf, struct > lock > _class *class) > entry = list_entries + i; > if (entry->class != class && entry->links_to != class) > continue; > - if (__test_and_set_bit(i, pf->list_entries_being_freed)) > + if (list_entry_being_freed(i)) > continue;
Yes, it is the above change that caught my eye.. That checks _both_ your lists. One is your current open one (@pf), but the other could already be pending the call_rcu(). So my question is why do we have to check both ?! How come the old code, that only checked @pf, is wrong? > + set_bit(i, pf->list_entries_being_freed); > nr_list_entries--; > list_del_rcu(&entry->entry); > } > > Please let me know if you need more information. > > Bart. >