> Sorry, this is a resend because the previous one was messed > up by my editor and hard to be read. > > void finish_wait( > struct wait_queue_head *wq_head, > struct wait_queue_entry *wq_entry) > { > .... > -> if (!list_empty_careful(&wq_entry->entry)) { > -> spin_lock_irqsave(&wq_head->lock, flags); > -> list_del_init(&wq_entry->entry); > -> spin_unlock_irqrestore(&wq_head->lock, flags); > -> } > } > > After careful look into the stop/wakeup code, I found > the above code fragile or even buggy. This code was > introduced at least 14 years ago and seems fragile or > buggy now after years of study on SMP synchronization > by us. > > I understand this code are being used a lot and no bug > seems to emerge. But, as I'll explain, it depends on a lot > of unreliable implementation details. > > Firstly, > > static inline int list_empty_careful(const struct list_head *head) > { > struct list_head *next = head->next; > return (next == head) && (next == head->prev); > } > > note that the read of head->next & head->prev is not > protected by READ_ONCE. So the read is free to be > optimized out entirely. Luckily, this optimization is hard > for compilers now, since all other accesses are out of > finish_wait. And still, GCC won't spit aligned accesses > into multiple instructions so it is atomic so far. > > Secondly, > > if ( ! list_empty_careful(&wq_entry->entry) ) { > <remove entry with spinning-lock> > } > <ends stack frame of the function calling finish_wait> > <overwrites wq_entry with another frame> > > and > > __wake_up_common() --> > <read wq_entry->func> --> > <read wq_entry->flags> --> > autoremove_wake_function() --> > <remove wq_entry->entry from wait queue> --> > > are not properly ordered for SMP so that <read wq_entry->flags> > may be reordered after <remove wq_entry->entry from wait queue> > since no dependency or memory barrier forbids it. This may cause > <overwrites wq_entry with another frame> on one CPU takes place > before <read wq_entry->flags> on another CPU and cause > <read wq_entry->flags> to return bad value. > > This behavior is not reported may thank to: > - few code is using autoremove_wake_function > - CPU pipeline is not as deep as to make this emerge >
Hi, mingo, peterz Would you please have a look at it if it won't waste you much time. Thanks a lot. CC scheduler maintainers Best regards, Patrick Trol