On Wed, Jun 15, 2016 at 11:11:43AM +0300, Konstantin Belousov wrote:
> On Tue, Jun 14, 2016 at 10:26:14PM -0500, Eric Badger wrote:
> > I believe they all have more or less the same cause. The crashes occur
> > because we acquire a knlist lock via the KN_LIST_LOCK macro, but when we
> > call KN_LIST_UNLOCK, the knote???s knlist reference (kn->kn_knlist) has
> > been cleared by another thread. Thus we are unable to unlock the
> > previously acquired lock and hold it until something causes us to crash
> > (such as the witness code noticing that we???re returning to userland with
> > the lock still held).
> > I believe there???s also a small window where the KN_LIST_LOCK macro
> > checks kn->kn_knlist and finds it to be non-NULL, but by the time it
> > actually dereferences it, it has become NULL. This would produce the
> > ???page fault while in kernel mode??? crash.
> > If someone familiar with this code sees an obvious fix, I???ll be happy to
> > test it. Otherwise, I???d appreciate any advice on fixing this. My first
> > thought is that a ???struct knote??? ought to have its own mutex for
> > controlling access to the flag fields and ideally the ???kn_knlist???
> > field.
> > I.e., you would first acquire a knote???s lock and then the knlist lock,
> > thus ensuring that no one could clear the kn_knlist variable while you
> > hold the knlist lock. The knlist lock, however, usually comes from
> > whichever event producing entity the knote tracks, so getting lock
> > ordering right between the per-knote mutex and this other lock seems
> > potentially hard. (Sometimes we call into functions in kern_event.c with
> > the knlist lock already held, having been acquired in code outside of
> > kern_event.c. Consider, for example, calling KNOTE_LOCKED from
> > kern_exit.c; the PROC_LOCK macro has already been used to acquire the
> > process lock, also serving as the knlist lock).
> This sounds as a good and correct analysis. I tried your test program
> for around a hour on 8-threads machine, but was not able to trigger the
> issue. Might be Peter have better luck reproducing them. Still, I think
> that the problem is there.
I got this after 10 runs:
userret: returning with the following locks held:
exclusive sleep mutex process lock (process lock) r = 0 (0xcb714758) locked @
cpuid = 0
KDB: stack backtrace:
kdb_backtrace(c17a92d1,0,c1228287,f3b29b94,0,...) at kdb_backtrace+0x2d/frame
vpanic(c1228287,f3b29b94,c1228287,f3b29b94,f3b29b94,...) at vpanic+0x115/frame
witness_warn(2,0,c15ba937,f3b29ca8,c0c018d0,...) at witness_warn+0x32a/frame
userret(cc2e1340,f3b29ce8,c15aadd7,4,0,...) at userret+0x92/frame 0xf3b29c20
syscall(f3b29ce8) at syscall+0x50e/frame 0xf3b29cdc
I'll apply the patch and test.
firstname.lastname@example.org mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"