http://lwn.net/Articles/146094/ACPI, device interrupts, and suspend states
The 2.6.13-rc5 prepatch brought with it the reversal of a couple of
ACPI-related patches. A look at what happened is rewarding in that it
shows how hard it can be to get some things right, and how the kernel
development model tries to address these issues.
Earlier 2.6.13 prepatches included a change to the core ACPI system. Whenever the system (or a part of it) is being suspended, the modified ACPI code would break the link which routed device interrupts into the processor. This change is part of a new set of rules which expects every device to release its interrupt line on suspend, and to reacquire it on resume. There are a few reasons for wanting to do things this way:
The problem with the ACPI change is that it breaks a large number of drivers, and, as a result, it breaks suspend on systems where it used to work. The power management hackers seem to see this situation as an unfortunate, but necessary step toward getting suspend working reliably on a much broader range of hardware. Having individual drivers release and reacquire their interrupts is also seen as necessary to support runtime power management - suspending of individual devices in a running system to save power. The ACPI change, it is said, fixes more systems than it breaks, and is thus worthwhile. Linus disagreed and reverted the patch, saying: The thing is, we're better off making very very
slow progress that is _steady_, than having people who _used_ to have
things work for them suddenly break.
So I believe that if we fix two machines and break one machine, we've actually regressed. It doesn't matter that we fixed more than we broke: we _still_ regressed. Because it means that people can't trust the progress we make! The right solution, according to Linus, is to go ahead and add the free_irq() and request_irq() calls to individual drivers when it makes sense to do so, and when it does not break things for individual users. Meanwhile, however, the ACPI subsystem should still restore the interrupt state on resume so that unmodified drivers do not break. There are some remaining issues with how that is done: it may involve running the ACPI AML interpreter with interrupts disabled, which leads to a number of interesting situations. Benjamin Herrenschmidt also pointed out that it could lead to situations where drivers may not be able to receive interrupts during the resume process. Eventually, one assumes, these details will be worked out. In the mean time, it will be interesting to see if the "revert any patch that breaks somebody's machine" policy holds. If it leads to a more stable experience for Linux users, it seems like it would be a good thing.
ACPI, device interrupts, and suspend states Posted Aug 4, 2005 9:55 UTC (Thu) by NAR (subscriber, #1313) [Link] So this could be the reason of those "IRQ 11 nobody cared" (or something like this) error messages I got with an accompanying stack trace... As always, LWN's Kernel Page worths every cent of my subscription fee.
ACPI, device interrupts, and suspend states Posted Aug 5, 2005 19:03 UTC (Fri) by zblaxell (subscriber, #26385) [Link] There are ways to make progress without so much disruption. Believe it or not, people do notice when the kernel starts spitting out printk messages that it didn't spit out before, especially if those messages describe what has to be changed, or who has to be notified, and if those messages are triggered many times by doing something routine that the user relies on.
Something along the lines of "driver XXX didn't call free_irq before
suspend, please fix it" would be nice. I'd even sift through a bunch of
"driver XXX _did_ call request_irq after resume" messages to ensure
that all the devices in my system were accounted for, and maybe even
fix the ones that aren't. I'd be willing to run the latest kernel on most of my less essential
machines, if I could expect the latest kernel to discover and complain
about expected future breakage in my specific circumstances, and if I
can expect most of the stuff that worked in the previous kernel to work
in the next. On the other hand, if the kernel developers' approach to API and
policy
changes is to just commit a patch and let everyone else catch up with
fixes to things that--although possibly fragile and ugly--were not
actually broken in the first place, then I'll only run those kernels
when they're well and truly finished (i.e. never), or when I have
absolutely no alternative.
ACPI, device interrupts, and suspend states Posted Sep 9, 2005 1:50 UTC (Fri) by mmarq (guest, #2332) [Link] " In the mean time, it will be interesting to see if the "revert any patch that breaks somebody's machine" policy holds. If it leads to a more stable experience for Linux users, it seems like it would be a good thing. " Sorry,... but that to me and perhaps 90% of users out there, from
server to desktop, is dont change from kernel altogheter until is
proved reliable and safe. I'm not trying to annoy no one, but it still
seems to me that there is a slight disconnection betwen the
developement world and the user world. More and better modularity would adress some issues, until then
is one kernel altogether or one another... no patches in the middle.
|