On Wed, Nov 11, 2020 at 11:54:41AM +0100, Ingo Molnar wrote: > > We cannot get reported other than the first one. > > Correct. Experience has shown that the overwhelming majority of > lockdep reports are single-cause and single-report. > > This is an optimal approach, because after a decade of exorcising > locking bugs from the kernel, lockdep is currently, most of the time,
I also think Lockdep has been doing great job exorcising almost all locking bugs so far. Respect it. > in 'steady-state', with there being no reports for the overwhelming > majority of testcases, so the statistical probability of there being > just one new report is by far the highest. This is true if Lockdep is only for checking if maintainers' tree are ok and if we totally ignore how a tool could help folks in the middle of development esp. when developing something complicated wrt. synchronization. But I don't agree if a tool could help while developing something that could introduce many dependency issues. > If on the other hand there's some bug in lockdep itself that causes > excessive false positives, it's better to limit the number of reports > to one per bootup, so that it's not seen as a nuisance debugging > facility. > > Or if lockdep gets extended that causes multiple previously unreported > (but very much real) bugs to be reported, it's *still* better to > handle them one by one: because lockdep doesn't know whether it's real Why do you think we cannot handle them one by one with multi-reporting? We can handle them with the first one as we do with single-reporting. And also that's how we work, for example, when building the kernel or somethinig. > > So the one who has introduced the first one should fix it as soon > > as possible so that the other problems can be reported and fixed. > > It will get even worse if it's a false positive because it's > > worth nothing but only preventing reporting real ones. > > Since kernel development is highly distributed, and 90%+ of new > commits get created in dozens of bigger and hundreds of smaller > maintainer topic trees, the chance of getting two independent locking > bugs in the same tree without the first bug being found & fixed is > actually pretty low. Again, this is true if Lockdep is for checking maintainers' tree only. > linux-next offers several weeks/months advance integration testing to > see whether the combination of maintainer trees causes > problems/warnings. Good for us. > > That's why kernel developers are so sensitive to Lockdep's false > > positive reporting - I would, too. But precisely speaking, it's a > > problem of how Lockdep was designed and implemented, not false > > positive itself. Annoying false positives - as WARN()'s messages are > > annoying - should be fixed but we don't have to be as sensitive as we > > are now if the tool keeps normally working even after reporting. > > I disagree, and even for WARN()s we are seeing a steady movement > towards WARN_ON_ONCE(): exactly because developers are usually > interested in the first warning primarily. > > Followup warnings are even marked 'tainted' by the kernel - if a bug > happened we cannot trust the state of the kernel anymore, even if it > seems otherwise functional. This is doubly true for lockdep, where I definitely think so. Already tainted kernel is not the kernel we can trust anymore. Again, IMO, a tool should help us not only for checking almost final trees but also in developing something. No? > But for lockdep there's another concern: we do occasionally report > bugs in locking facilities themselves. In that case it's imperative > for all lockdep activity to cease & desist, so that we are able to get > a log entry out before the kernel goes down potentially. Sure. Makes sense. > I.e. there's a "race to log the bug as quickly as possible", which is > the other reason we shut down lockdep immediately. But once shut down, Not sure I understand this part. > all the lockdep data structures are hopelessly out of sync and it > cannot be restarted reasonably. Is it about tracking IRQ and IRQ-enabled state? That's exactly what I'd like to point out. Or is there something else? > Not sure I understand the "problem 2)" outlined here, but I'm looking > forward to your patchset! Thank you for the response. Thanks, Byungchul