On Tue, Dec 03, 2013 at 08:03:21PM +0700, Robert Elz wrote: > Date: Tue, 3 Dec 2013 13:04:30 +0100 > From: Manuel Bouyer <[email protected]> > Message-ID: <[email protected]> > > | I think what happens is: > | - the kernel is running in a code portion protected by > | a spin mutex registered at IPL_FOO, with IPL_FOO < IPL_SCHED > | - a clock interrupt comes in, > > Yes, that would explain it all - I was looking for any cases where > mutexes aren't released, and hadn't found any (and it would have had to be > something rare or the forgotten mutex would cause a hang eventually). > > This is a better explanation - it didn't occur to me as I had this > impression that spin mutexes would always raise the ipl to IPL_HIGH > (which didn't seem so bad, as they should be used and released very > quickly) but now you mention it, I did see the init code, and the > ipl value in the mutex itself... > > | Now I guess it's done this way because it's allowed to release mutexes > | in any order: > > Yes, that much I understood... I'm not sure I agree with it as a > programming model (for spin mutexes - for other more general locking > the flexibility is probably required) - but in general, apart from > very special cases (mutex_spin_enter being the canonical case of course) > functions should not be returning with spin mutexes (they grabbed) still > held, it should be grab/work/release - if that were the model, the > issue of re-ordering the releases wouldn't ever arise ("work" - which might > also include interrupts firing - might grab a spin mutex itself, so nested > mutexes are clearly needed, only correctly nested ones probably ought to > be supported.)
I'm not sure about this; there are cases (e.g. in processing chains) where you need to grab the next mutex before releasing the previous one to avoid races. > > In any case, since your analysis shows how this all happens without presuming > any bugs, I think we can assume that for this, there are no bugs. All that > remains is to decide what to do with the printf (well, not much of a choice > there - just delete it, it is annoying and now useless...) and the forced > resetting (lowering) of ci_ilevel that is currently being done. I don't > see that doing any harm, especially not in this case, but ... I'm just removed this block of code from HEAD. I will request a pullup in a few days if all is well. Note that resetting ci_ilevel here isn't needed, the code will set is anyway in the next iteration of the loop, or at loop exit. -- Manuel Bouyer <[email protected]> NetBSD: 26 ans d'experience feront toujours la difference --
