Hi Herton,
On 08/10/2015 05:31 PM, Herton R. Krzesinski wrote:
Well without the synchronize_rcu() and with the semid list loop fix I was still
able to get issues, and I thought the problem is related to racing with IPC_RMID
on freeary again. This is one scenario I would imagine:
A B
freeary()
list_del(&un->list_id)
spin_lock(&un->ulp->lock)
un->semid = -1
list_del_rcu(&un->list_proc)
__list_del_entry(&un->list_proc)
__list_del(entry->prev, entry->next) exit_sem()
next->prev = prev; ...
prev->next = next; ...
... un =
list_entry_rcu(ulp->list_proc.next...)
(&un->list_proc)->prev = LIST_POISON2 if (&un->list_proc ==
&ulp->list_proc) <true, last un removed by thread A>
... kfree(ulp)
spin_unlock(&un->ulp->lock) <---- bug
Now that is a very tight window, but I had problems even when I tried this patch
first:
(...)
- if (&un->list_proc == &ulp->list_proc)
- semid = -1;
- else
- semid = un->semid;
+ if (&un->list_proc == &ulp->list_proc) {
+ rcu_read_unlock();
What about:
+ spin_unlock_wait(&ulp->lock);
+ break;
+ }
+ spin_lock(&ulp->lock);
+ semid = un->semid;
+ spin_unlock(&ulp->lock);
+ /* exit_sem raced with IPC_RMID, nothing to do */
if (semid == -1) {
rcu_read_unlock();
- break;
+ synchronize_rcu();
+ continue;
}
(...)
So even with the bad/uneeded synchronize_rcu() which I had placed there, I could
still get issues (however the testing on patch above was on an older kernel than
latest upstream, from RHEL 6, I can test without synchronize_rcu() on latest
upstream, however the affected code is the same). That's when I thought of
scenario above. I was able to get this oops:
Adding sleep() usually help, too. But it is ugly, so let's try to
understand the race and to fix it.
Best regards,
Manfred
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/