Re: [PATCH] PM: Introduce PM_EVENT_HIBERNATE (was: Re: i915 hibernation patch (was: Re: 2.6.25-rc2 System no longer ...))
On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: Thanks for testing. Below is the final version of the patch with a changelog etc. Thanks, applied. With this, I also find that I dislike the use of suspend/resume for freezing for STD a lot less. It's still too easy to get confused, but at least now drivers always have total knowledge about what is really going on. I'd not like this interface as a driver writer, but now it's not fundamentally broken any more, just slightly confusing. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Suspend-devel] 2.6.25-rc2 System no longer powers off aftersuspend-to-disk. Screen becomes green.
On Fri, 22 Feb 2008, Ingo Molnar wrote: btw., why isnt there an in-kernel whitelist, with perhaps a dynamic, convenient /debug/s2r/whitelist append-API for distros (and testers) to add more entries to the whitelist/blacklist? (for cases where the kernel whitelist has not caught up yet) Which would eventually converge to Utopia: s2ram that just works out of box. The big problem with that is - the people who know about the devices are usually not kernel people - the workarounds that the whitelist requires is quite often not a kernel workaround. In other words, the most common workarounds for the s2ram whitelist is usually to do things like running vbetool in user-level to do VGA register save/restore (VBE_POST and VGE_SAVE). Sure, the kernel could do that with usermodehelper etc, but s2ram also has those things as command line flags etc, so... Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: i915 hibernation patch (was: Re: [Suspend-devel] 2.6.25-rc2 System no longer powers off aftersuspend-to-disk. Screen becomes green.)
On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: --- linux-2.6.orig/drivers/char/drm/i915_drv.c +++ linux-2.6/drivers/char/drm/i915_drv.c @@ -222,6 +222,7 @@ static void i915_restore_vga(struct drm_ dev_priv-saveGR[0x18]); /* Attribute controller registers */ + inb(st01); for (i = 0; i 20; i++) i915_write_ar(st01, i, dev_priv-saveAR[i], 0); inb(st01); /* switch back to index mode */ I'm doing this part separately, please drop it - it has nothing to do with the rest of the patch. I'd also suggest that you just add a helper function like int pm_event_powerdown(struct pm_message mesg) { return mesg.event = PM_EVENT_SUSPEND; } or something, so that you can have code like if (pm_event_powerdown(mesg)) pci_set_power_state(pdev, PCI_D3hot); instead of the test for EVENT_SUSPEND/HIBERNATE explicitly. Of course, the places that already do a switch-statement are much better kept that way and just add PM_EVENT_HIBERNATE to the list. --- linux-2.6.orig/drivers/pci/pci.c +++ linux-2.6/drivers/pci/pci.c @@ -554,6 +554,7 @@ pci_power_t pci_choose_state(struct pci_ case PM_EVENT_PRETHAW: /* REVISIT both freeze and pre-thaw should use D0 */ case PM_EVENT_SUSPEND: + case PM_EVENT_HIBERNATE: return PCI_D3hot; Didn't you miss the apci_pci_choose_state() thing that also needs this extension? Index: linux-2.6/drivers/usb/host/u132-hcd.c === --- linux-2.6.orig/drivers/usb/host/u132-hcd.c +++ linux-2.6/drivers/usb/host/u132-hcd.c @@ -3214,14 +3214,18 @@ static int u132_suspend(struct platform_ return -ESHUTDOWN; } else { int retval = 0; -if (state.event == PM_EVENT_FREEZE) { + + switch (state.event) { + case PM_EVENT_FREEZE: retval = u132_bus_suspend(hcd); -} else if (state.event == PM_EVENT_SUSPEND) { + break; + case PM_EVENT_SUSPEND: + case PM_EVENT_HIBERNATE: int ports = MAX_U132_PORTS; while (ports-- 0) { port_power(u132, ports, 0); } -} + break; if (retval == 0) pdev-dev.power.power_state = state; return retval; Looks like a missing close-brace to me there - you removed the final '}'. Or am I blind? Apart from those issues it looks fine to me. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: i915 hibernation patch (was: Re: [Suspend-devel] 2.6.25-rc2 System no longer powers off aftersuspend-to-disk. Screen becomes green.)
On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: In the revised patch below I redefined the PM_EVENT_* things as flags so that I can or them and defined PM_EVENT_SLEEP in analogy with CONFIG_PM_SLEEP. Ok, looks fine by me. Didn't you miss the apci_pci_choose_state() thing that also needs this extension? No, I didn't. That one operates on the ACPI D* states. Ok. I consider that just about the worst interface ever, but whatever... OK, please have a look at the modified patch below. All right, I'm fine with it. Now we just need to confirm that it works for people.. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.
On Thu, 21 Feb 2008, Jeff Chua wrote: After inserting return 0; right at the top of those two functions, suspend (and power-off properly), and resume (without green screen) works just fine. I would like to know what they're for. Try suspend-and-resume without X. Also, try it on one of the more modern laptops - even *with* X. Basically, the kernel wants to be able to do what X does, because it means that when it works, it works _so_ much better than doing it in X. So getting it working is definitely worth it. That said, before you do anything else, try if suspend-to-RAM works. That's the primary goal for this code anyway, and if it works that gives a good hint. Suspend-to-disk is fundamentally different, and it's entirely possible that for the suspend-to-disk case we should just say screw trying to suspend/resume graphics, since you'll have the BIOS resuming text-mode anyway, and there are no performance or debugging advantages. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.
On Thu, 21 Feb 2008, Jeff Chua wrote: Works without those two functions. Ahh. You're using the BIOS to re-initialize your video, aren't you? If STR works without X, then you have something else resuming graphics, and that may be what then interacts badly with the fact that the kernel also does so. Ok, what's next? Let's try to narrow it down to what the interaction is. Are you using something like acpi_sleep=s3_bios or similar? That's what the kernel support is supposed to make unnecessary in the long run, along with all the video mode flickering (ie we should be able to resume to the video mode we want, not flicker through unnecessary modes). Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.
On Thu, 21 Feb 2008, Jeff Chua wrote: That said, before you do anything else, try if suspend-to-RAM works. Linus, guess I missed this part ... so before touch anything, I did tried suspend-to-ram, and it works on console and in X. Ok, so this is with clean current -git, and nothing disabled? And suspend-to-disk hangs, but I can still press and hold the power button to power it off. The press and hold for five seconds is actually a hardware feature of the southbridge (well, I guess there is software in there too, but it's the embedded kind). So the fact that it powers off at that point means nothing, it just means that ok, your kernel is hung, but the hardware still works ;) This *sounds* like some part of the suspend-to-disk sequence is doing something stupid like trying to access the screen after it has been turned off, which doesn't surprise me at all. My oft-stated opinion has been that suspend-to-disk isn't a suspend at all, and should never have been confused with suspending anything. It's snapshot-and-restore, and my opinion is that: - it should *never* call suspend()/resume() at all (that should be reserved purely for suspend-to-RAM and has real power management issues!) - it should have a totally separate halt/unhalt/restore thing that has nothing what-so-ever to do with power management, and is purely about stopping the hardware for things like USB and network cards (which otherwise do things like scan their command lists asynchronously) and making sure that the driver state is consistent with that stopped hw state. - the people who confuse snapshot/restore with suspend/resume are horrible people that cause problems exactly because driver people then get those things mixed up, and something like the video suspend/resume should probably never have impacted suspend-to-disk in the first place! HOWEVER, that's a separate fight I've had, and in the meantime: Then upon powering on and resume, I get the ugly green console screen. I can still type and move around. Starting X runs fine. Ctrl-Alt-Del or switching back to console will get back to the green screen. .. so this implies that while the laptop apparently hung at the end of the snapshotting, the snapshotting did actually work, and it must have hung at the very end, presumably when it tried to actually turn the power off. So there seems to be two (probably largely independent) problems: - the hang at shutdown that requires you to press-and-hold the power button to actually cut power. At a guess: putting the VGA device into D3hot makes the ACPI code that actually does the shutoff unhappy. Probably because it wants to access the device, and ends up not ever getting the replies it wants, since the hardware has been turned off. - the fact that we restore something wrong for you and the screen is green. At a guess: the restore_vga ends up restoring some state that wasn't correctly and fully saved. IOW, I think your patch that disables the two lines actually ends up pretty much matching the two *different* problems. Can you confirm that doing those two parts of that patch individually actually does individually fix the two issues? (Ie disabling D3hot makes it shut down nicely but resume with green text, while disabling just restore_vga() ends up with shutdown problems, but once you press-and-hold the power button, the thing will then restore nicely)+ Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.
On Wed, 20 Feb 2008, Rafael J. Wysocki wrote: I think we should export the target sleep state somehow. Yeah. By *not* using -suspend() for freezing or hibernate. Please, Rafael - just make the f*cking suspend-to-disk use other routines already. 99% of all hardware needs to do exactly *nothing* on suspend-to-disk, and the ones that really do need things tend to need to not do a whole lot. For example, the freeze action for USB (which is one of the hardest things to suspend) should literally be something like just setting the controller STOP bit, and waiting for it to have stopped. The unfreeze should be to just clear the stop bit, while the restart should be just a controller reset to use the current memory image. NONE OF THIS HAS ABSOLUTELY ANYTHING TO DO WITH SUSPEND. It never did. I've told people so for years. Maybe actually seeing the problems will make people realize. So please, we shouldn't call -suspend[_late] or -resume[_early] at all. Not with PMSG_FREEZE, not with PMSG_*anything*. Can we please get this fixed some day? Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.
On Wed, 20 Feb 2008, Jesse Barnes wrote: The current callback system looks like this (according to Rafael and the last time I looked): -suspend(PMSG_FREEZE) -resume() -suspend(PMSG_SUSPEND) *enter S3 or power off* -resume() Yes, it's very messy. It's messy for a few different reasons: - the one you hit: a driver actually has a really hard time telling what PMSG_SUSPEND really means. - more importantly, we generally don't want to suspend/resume the hardware at all around a power-off, because we're going to resume with the state at the time of the PMSG_FREEZE, which means that the hardware has actually *changed* and been used in between! that second case is very fundamental for things like USB devices, which in theory you can hold alive over a real suspend event (ie a STR event), but which absolutely MUST NOT be resumed over a suspend-to-disk event, because all the low-level request state is bogus! So the -resume really isn't a resume at all. It's much closer to a -reset. Of course, the solution to this all right now is that we have to reset everything even if it *is* a suspend event, so it basically means that STR ends up using the much weaker model that snapshot-to-disk uses. The fundamental problem being that the two really have nothing what-so-ever to do with each other. They aren't even similar. Never were. And in the long term we could have: -suspend() *enter S3* -resume() Yes, apart from all the complexities (suspend_late/resume_early). So in reality it's more than that, but the suspend/resume things are clearly nesting, and they have the potential to actually keep state around (because we *know* this machine is not going to mess with the devices in between). IOW, here we actually can have as an option assume the device is there when you return. or: -hibernate() *kexec to another kernel to save image* *power off* -return_from_hibernate() (or somesuch) Enough people don't trust kexec that I suspect the right thing simply is -freeze() // stop dma, synchronize device state *snapshot* -unfreeze(); // resume dma *save image* [ optionally -poweroff() ] // do we really care? I'd say no *power off* -restore() // reset device to the frozen one which may have four entry-points that can be illogically mapped to the suspend/resume ones like we do now, but they really have nothing to do with suspending/resuming. And notice how while freeze/restore kind of pairs like a suspend/resume, it really shouldn't be expected to realistically restore the same state at all. The restore part is generally much better seen as a reset hardware than a resume thing. Because we literally cannot trust *anything* about the state since we froze it - we might have booted a different OS in between etc. Very different from suspend/resume. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.
On Wed, 20 Feb 2008, Jesse Barnes wrote: Really, in the simple s3 case we still need early/late stuff? Absolutely. Two big reasons: - debuggability I know we don't do this correctly right now, but I want to be able to at least feel like we can some day actually do printk's etc through 99% of the suspend/resume cycle. It's a *huge* thing for debugging problems that happen in the wild, and one of the biggest issues is that we currently usualyl just get a the machine died message when suspend or resume doesn't work. Yes, doing printk's to the Intel management flash stuff can help a lot here, and I want that too, but I'd really like to shut down consoles individually rather than having the big hammer approach that shuts them up entirely over the whole suspend/resume sequence (or not at all, if you use no_console_suspend). And I'd *really* like to do things like VGA-console shutdown in the late phase (and resume early). - it's actually likely *much* simpler for some devices. Simple devices (and that includes things like PCI bridges etc, but also potentially USB host controllers etc) are things that can often be trivially suspended - all the complexity is really not in the controller itself, but beyond, in the bus that it actually drives. And the late-suspend/early-resume means that you don't have to worry about things like interrupts happening while you're suspended. Yes, putting the device into D3 will disable interrupts from that device too (unless there are bugs), *BUT* you may be sharing an interrupt line, and interrupts may be posted and delayed, so an earlier interrupt may well be pending etc. suspending late and resuming early just avoids those issues entirely. Sometimes these things interact. For example, firewire is certainly not trivial to suspend as a subsystem thing (ie all the devices behind the firewire bridge need to do magic things, like spinning down etc that obviously can not happen in the final late phase), but the firewire controller itself is likely trivial to suspend/resume and can easily be handled in the late/early routines. And guess what? It's also exactly what you want to happen in case you end up using the firewire RDMA as a debug aid. IOW, you want that firewire controller (and the PCI bridges) working really early, so that if a problem does happen when you resume some more complex device (say, one of the graphics chips that need X to really come alive), you can use the firewire rdma to read out the kernel log buffer from memory. Well, it seems like we'll have to fix drivers in either case, and isn't a kexec approach fundamentally more sound and simple, design-wise? Rafael pointed out some problems with properly setting wakeup states, but I think that could be overcome... I don't personally mind kexec at all, but on the other hand, I don't care about suspend-to-disk in the first place. I do know that some people really don't want it, and I suspect that they have valid reasons. Ranging from memory use to simply just performance. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.
On Wed, 20 Feb 2008, Rafael J. Wysocki wrote: which may have four entry-points that can be illogically mapped to the suspend/resume ones like we do now, but they really have nothing to do with suspending/resuming. Apart from putting devices into the right low power states, that is. And by right low power states you mean wrong low-power states, right? The thing is, they really *are* the wrong states for 99% of all hardware. If you really have a piece of hardware that you want to have the -poweroff() thing do the same as -suspend(), then hey, just use the same function (or better yet, use two different functions with a call to a shared part). Because IT IS NOT TRUE that -suspend() puts the devices in the right power state. The power states are likely to be totally different for S3 and for poweroff, and they are going to differ in different ways depending on the device type. One example would be the one that started this version of the whole discussion (shock horror! We're on subject!) ie when you do a system shutdown, you generally do not even *want* to put individual devices into low-power states at all, because the actual power off the system thing will take care of it for you much better. So to take just something as simple as VGA as an example: you really do not want to suspend that device, because you want to see the poweroff messages until the very end. So that final device -poweroff function really has absolutely *nothing* in common with the device -suspend[_late] functions, simply because almost any sane driver would decide to do different things. Of course, we can continue to do the insane thing and just continue to use inappropriate and misleadign function callback names, and then encodign what the *real* action should be in the argument and/or in magic system-wide state parameters. So in that sense, it's certainly totally the same thing whether we call it -shutdown or -poweroff or -eat_a_banana, since you could always just look at the argument and other clues, and decide that *this* time, for *this* kind of device, the eat a banana callback actually means that we should power it off, but wouldn't it be a lot more logical to just make it clear in the first place that they aren't called for the same reason at all? I'd claim that it's much easier for everybody (and _especially_ for device driver writers) to have static int my_shutdown(struct pci_device *dev, int state) { .. do something .. } static int my_suspend(struct pci_device *dev, int state) { .. do something .. } ... .shutdown = my_shutdown, .suspend = my_suspend, ... than to have static int my_suspend(struct pci_device *dev, state) { .. common code .. if (state == XYZZY) ..special code.. else ..other case code.. } ... .suspend = my_suspend ... even if the latter might be fewer lines. It doesn't really matter if it's fewer, does it, if the alternate version is more obvious about what it does? The other issue is that I've long wanted to make sure that when people fix suspend-to-ram, they don't screw up suspend-to-disk by mistake and vice versa. When a driver writer makes changes, he shouldn't have the kind of illogical oops, unintended consequences issues in general. It should be pretty damn obvious when he changes suspend code vs when he changes snapshot/restore code. We've somewhat untangled that on the core kernel layer, but we've left the driver confusion alone. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.
On Thu, 21 Feb 2008, Rafael J. Wysocki wrote: In fact we have acpi_pci_choose_state() that tells the driver which power state to put the device into in -suspend(). If that is used, the device ends up in the state expected by to BIOS for S4. First off, nobody should *ever* use that directly anyway. Secondly, the one that people should use (pci_choose_state()) doesn't actually do what you claim it does. It does all kinds of wrong things, and doesn't even take the target state into account at all. So look again. No. Again, if there are devices that wake us up from S4, but not from S5, they need to be handled differently in the *enter S4* case (hibernation) and in the *enter S5* case (powering off the system). And again, what does this have to do with (the example I used) the graphics hardware? Answer: nothing. The example I gave you we simply DO THE WRONG THING FOR. Same thing for things like USB devices - where pci_choose_state() doesn't work to begin with. Why do we call suspend() on such a thing when we don't want to suspend it? We shouldn't. We should call freeze/unfreeze (which are no-ops) and then finally perhaps poweroff, and that final stage might want to spin things down or similar. But *none* of it has anything to do with suspend, and none of it has anything to do with pci_choose_state() (much less acpi_pci_choose_state) The fact is, we should let the driver decide, and we should make it clear to the driver writer what he is deciding about - rather than basically lie and say suspend the device and put it into D3 even when that's the last thing it should ever do. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.
On Thu, 21 Feb 2008, Rafael J. Wysocki wrote: Secondly, the one that people should use (pci_choose_state()) doesn't actually do what you claim it does. It does all kinds of wrong things, and doesn't even take the target state into account at all. So look again. Well, if platform_pci_choose_state() is defined, pci_choose_state() returns its result and on ACPI systems that points to acpi_pci_choose_state(), so in fact it does what I said (apart from the error path). Did you check closer? I repeat: acpi_pci_choose_state() (when called from pci_choose_state()) doesn't even look at the target 'state'. It just blindly assumes that you want the deepest sleep-state you can have. Which happens to be correct for normal suspend, but means that if you want to test other states (through '/sys/devices/.../power'), that sounds broken. I didn't check any closer, but go check it yourself. The short and sweet: acpi_pci_choose_state() totally ignores its 'state' argument. Do you really think that's correct? But yes, pci_choose_state()' effectively does that too, apart from PM_EVENT_ON, which is never used. (But the whole and only point of pci_choose_state() was to do the PM_EVENT_FREEZE thing differently, which it doesn't do, so I think the real issue here is that the interface is really rather mis-designed) I suspect most people who ever really looked and worked on this code had a specific device in mind, and I'm sure that all of the code individually always ends up making sense from the standpoint of some specific device driver. It's just that it never seems to make sense from a bigger issues standpoint, and often seems senseless from the standpoint of other devices of other types. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.
On Tue, 19 Feb 2008, Jesse Barnes wrote: I found the same poweroff issue on my T61. It turned out to be related to the C state code disabling interrupts when it shouldn't iirc. Booting with 'idle=poll' seems to work around the problem. The green screen problem should be fixed (see the DRM git tree for details). ..and the latter is hopefully now merged in my tree too (at least some of the drm updates are). Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] acer-wmi - Fail gracefully if ACPI is disabled
On Mon, 11 Feb 2008, Carlos Corbacho wrote: WMI drivers, like their ACPI counterparts, should also check if ACPI is disabled or not, and bail out if so, otherwise we cause a crash. Shouldn't wmi_has_guid() just return false if ACPI isn't enabled, and the drivers should just then always give up? The proper way to get there would seem to be to just do this instead.. We should *not* add some random ACPI workarounds to individual drivers, we should just make the wmi subsystem so robust that nobody *cares* if acpi exists or is enabled on that machine. Linus --- drivers/acpi/wmi.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/wmi.c b/drivers/acpi/wmi.c index 36b84ab..457ed3d 100644 --- a/drivers/acpi/wmi.c +++ b/drivers/acpi/wmi.c @@ -673,11 +673,11 @@ static int __init acpi_wmi_init(void) { acpi_status result; + INIT_LIST_HEAD(wmi_blocks.list); + if (acpi_disabled) return -ENODEV; - INIT_LIST_HEAD(wmi_blocks.list); - result = acpi_bus_register_driver(acpi_wmi_driver); if (result 0) { - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suspend code ordering (again)
On Thu, 27 Dec 2007, Robert Hancock wrote: I doubt they would prefer the later ordering in any way that matters, if the Windows version they were designed for uses the earlier ordering. Well, I wouldn't say it's abotu preferring one over the other. It's very possible that the BIOS writers were *intending* to prefer ACPI 2.0, and it may even be likely that they thought that they wrote it that way, but the real issue is that it has apparently never ever been *tested* that way. So yes, maybe the vendors actually thought they were a good ACPI-2.0 implementation, but if Windows doesn't do the ordering that the 2.0 spec expects, then that is pretty much just a theoretical thing. But yeah, it would be really nice to have this verified some way. Somebody must already know (whether it's a VM person or a BIOS writer, and whether they'd tell us, is obviously another issue). Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suspend code ordering (again) (was: Re: x86: Increase PCIBIOS_MIN_IO to 0x1500 to fix nForce 4 suspend-to-RAM)
On Tue, 25 Dec 2007, Rafael J. Wysocki wrote: the ACPI specification between versions 1.0x and 2.0. Namely, while ACPI 2.0 and later wants us to put devices into low power states before calling _PTS, ACPI 1.0x wants us to do that after calling _PTS. Since we're following the 2.0 and later specifications right now, we're not doing the right thing for the (strictly) ACPI 1.0x-compliant systems. We ought to be able to fix things on the high level, by calling _PTS earlier on systems that claim to be ACPI 1.0x-compliant. That will require us to modify the generic susped code quite a bit and will need to be tested for some time. That's insane. Are you really saying that ACPI wants totally different orderings for different versions of the spec? And does Windows really do that? Please don't make lots of modifications to the generic suspend code. The only thing that is worth doing is to just have a firmware callback before the device_suspend() thing (and then on a ACPI-1.0 system, call _PTS *there*), and on an ACPI-2.0 system, call _PTS *after* device_suspend(). Still, the fact is, some (most, I think) drivers *should* put themselves into D3 only in late_suspend(), so if ACPI-2.0 really expects _PTS to be called after that, we're just screwed. That's when the system is really down, interrupts disabled etc, we don't want to call anything but the final ACPI turn us off stuff there! Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] swsusp: Use platform mode by default
On Wed, 17 Oct 2007, Qi Yong wrote: The key point is fall back to shutdown _only_ if !ops, otherwise don't touch hibernation_mode. And that solves my problem. Please, when resurrecting a five-month-old discussion, give more of the old context. I don't know about anybody else, but I get enough email that five months might as well be last century when it comes to email discussions, and I have long since forgotten the details of whatever caused the issue to begin with ;) Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ACPI power off regression in 2.6.23-rc8 (NOT in rc7)
On Tue, 25 Sep 2007 08:51:09 +0200 Damien Wyart [EMAIL PROTECTED] wrote: Hello, After testing rc8, I noticed that I couldn't power off the computer directly, it only got halted and I had to press the power button manually. Just before displaying System halted, the following message is displayed: ACPI : PCI interrupt for device :02:08.0 disabled I had to first revert 5a50fe709d527f31169263e36601dd83446d5744 then f216cc3748a3a22c2b99390fddcdafa0583791a2 (handling of Sx states) to recover previous behaviour. Hmm. Those things *do* seem to be suspicious. For example, those commits seem to move code that used to be inside CONFIG_PM (which pretty much *everybody* has) to be inside CONFIG_ACPI_SLEEP (which is a totally different thing, and depends on whether the user asked for suspend support or not! Damien - does it work if you ask for SUSPEND or HIBERNATION support? Len - why are you guys moving stuff into CONFIG_PM_SLEEP? I know you seem to think that absolutely *everybody* should always support suspend and hibernation, but the fact is, not everybody does. And it's a totally separate thing for normal ACPI CPU runstate support that people have used to manage a *running* CPU (and shutting it down). Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/2] 2.6.23-rc3: known regressions with patches
On Wed, 29 Aug 2007, Michal Piotrowski wrote: Clocks time Subject : double hpet clocksource hard freeze References : http://lkml.org/lkml/2007/8/23/257 Last known good : ? Submitter : Paolo Ornati [EMAIL PROTECTED] Caused-By : Tony Luck [EMAIL PROTECTED] commit 0aa366f351d044703e25c8425e508170e80d83b1 Handled-By : John Stultz [EMAIL PROTECTED] Tony Luck [EMAIL PROTECTED] Patch : http://lkml.org/lkml/2007/8/23/285 Status : patch available Should be solved by 3b2b64fd311c92f2137eb7cee7025794cd854057, although differently from the the patch suggested above. So the regression should be gone a of -rc5. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ACPI on Averatec 2370
On Thu, 2 Aug 2007, Cal Peake wrote: On Thu, 2 Aug 2007, Linus Torvalds wrote: That said, the AMD Turion(tm) 64 X2 Mobile Technology TL-52 _should_ be a REV-F CPU afaik, and it should have thus fallen through to the ENABLE_C1E_MASK logic. Afaik that's broken. Cal - can you (a) test that forcing a return 1 from that amd_apic_timer_broken() function fixes it for you. ACK (b) make that function print out the values it uses for debugging (ie the xtended family and model numbers, and the MSR_K8_ENABLE_C1E MSR values)? eax CPUID_XFAM == 0x eax CPUID_XMOD == 0x0004 Yeah, that's a REV-F MSR_K8_ENABLE_C1E lo == 0x04c14015 MSR_K8_ENABLE_C1E hi == 0x lo ENABLE_C1E_MASK == 0 And yeah, that claims that C1E is not on, but: amd_apic_timer_broken: forcing return value of 1 since this makes it all work for you, it does appear that the AMD local timer stops in C1 even when that isn't true, and as such is not useful. Sad. It probably means that we have to disable the local timer for *all* modern AMD CPU's. Thomas/Ingo - did something change in the local apic programming? Or why did this work before? Was it just that we didn't use the local timer apic for some other reason? Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ACPI on Averatec 2370
On Thu, 2 Aug 2007, Cal Peake wrote: Figured I should have sent that right after I hit the send key... processor : 0 vendor_id : AuthenticAMD cpu family: 15 model : 72 model name: AMD Turion(tm) 64 X2 Mobile Technology TL-52 Sadly, this doesn't show the extended family stuff from cpuid. So it doesn't show any of the bits we actually care about. Sad. That said, the AMD Turion(tm) 64 X2 Mobile Technology TL-52 _should_ be a REV-F CPU afaik, and it should have thus fallen through to the ENABLE_C1E_MASK logic. Afaik that's broken. Cal - can you (a) test that forcing a return 1 from that amd_apic_timer_broken() function fixes it for you. (b) make that function print out the values it uses for debugging (ie the xtended family and model numbers, and the MSR_K8_ENABLE_C1E MSR values)? Andi, can you check with your AMD contacts that those bits are correct.. Maybe the Mobile Technology things *always* have the broken Enhanced Halt State, regardless of any MSR settings? That would perhaps be what makes them Mobile. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Introduce CONFIG_HIBERNATION and CONFIG_SUSPEND (updated)
Ok, I took this, and modified Len's patch to re-introduce ACPI_SLEEP on top of it (I took the easy way out, and just made PM_SLEEP imply ACPI_SLEEP, which should make everything come out right. I could have dropped ACPI_SLEEP entirely in favour of PM_SLEEP, but that would have implied changing more of Len's patch than I was really comfy with). Len, Rafael, please do check that the end result looks ok. I suspect ACPI could now take the PM_SLEEP/SUSPEND/HIBERNATE details into account, and that some of the code is not necessary when HIBERNATE is not selected, for example, but I'm not at all sure that it's worth it being very fine-grained. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for 2.6.23-rc1)
On Sat, 28 Jul 2007, Len Brown wrote: That three-liner will crash ACPI+SMP-HOTPLUG_CPU kernels on resume. Explain that to me. There should *be* no resume. ACPI doesn't suspend/resume on its own, I hope. It is all done by the top-level suspend/resume code, not by ACPI. ACPI is a pure helper, and if you've changed that, then I think we need to revert more than a few lines. And it's the *top*level* code that selects HOTPLUG_CPU. Through SUSPEND_SMP (which will select HOTPLUG_CPU) and SOFTWARE_SUSPEND. This is why it's so *totally* and *utterly* bogus for ACPI to select features that it has nothign what-so-ever to do with. In other words: ACPI isn't in the driving seat. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for 2.6.23-rc1)
On Sat, 28 Jul 2007, Linus Torvalds wrote: And it's the *top*level* code that selects HOTPLUG_CPU. Through SUSPEND_SMP (which will select HOTPLUG_CPU) and SOFTWARE_SUSPEND. In other words, the problem seems to be that kernel/power/main.c: suspend_devices_and_enter() does the proper disable/enable_nonboot_cpus(), but it does so without having enabled CPU hotplug. And you seem to think that it's ACPI that should enable the hotplug, even though the code that actually needs it is _outside_ ACPI. And I think that's wrong, and that this is a bug. So I think the real issue is that we allow that suspend_devices_and_enter() code to be compiled without HOTPLUG_CPU in the first place. It's not supposed to work that way. Of course, it may well be that other architectures can happily suspend even with multiple CPU's active, which may be the cause of this mess. But I really think it shouldn't be ACPI that has to select the CPU hotplug, since it's not ACPI that _uses_ it in the first place. Rafael: making a config option for STR (the same way we have a config option for hibernate), and just not allowing it on SMP without HOTPLUG_CPU seems to be the right thing. Len is right in that we do insane things right now (trying to STR with multiple CPU's still active), and I just don't think he's the one that should work around it! Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for 2.6.23-rc1)
On Sat, 28 Jul 2007, Rafael J. Wysocki wrote: OK, I'll prepare a patch to introduce CONFIG_SUSPEND, but that will require quite a bit of (compilation) testing on different architectures. Sure. I'm not too worried, the fallout should be of the trivial kind. Also, mind basing it on the (independent) cleanups that Adrian already sent out. This is all intertwined.. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for 2.6.23-rc1)
On Thu, 26 Jul 2007, Rafael J. Wysocki wrote: Hmm, perhaps we should introduce a CONFIG_SUSPEND and change CONFIG_SOFTWARE_SUSPEND into CONFIG_HIBERNATION, both depending on CONFIG_PM? There's quite some code needed only for suspend compiled in when CONFIG_PM is set ... Sounds like a good idea, although I suspect that CONFIG_PM really *is* fairly close to CONFIG_SUSPEND. The thing is, all the stuff it enabled is largely useless without at least STR. (Yes, I realize that you can do per-driver suspend events etc, but I suspect not a lot of people have used it). But from a logical standpoint, it does make sense to have a separate config option for the STR support. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PATCH] ACPI patches for 2.6.23-rc1
On Thu, 26 Jul 2007, Len Brown wrote: Feel free to share what you know about the benefits vs. the costs of maintaining CONFIG_ACPI_SLEEP as a build option. Why don't you just make CONFIG_ACPI_SLEEP dependent on SOFTWARE_SUSPEND and STR? If you feel that your system has been degraded because it now includes what used to be excluded under CONFIG_ACPI_SLEEP=n, please let me know how. I feel that I get asked to include a feature that (a) I have no interest in on that machine (b) I didn't need to include before. What was the advantage? And what was it that caused something like this to be a post-rc1 thing. That makes me really unhappy. This is a *regression*. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for 2.6.23-rc1)
On Thu, 26 Jul 2007, Rafael J. Wysocki wrote: My point is we have ACPI dependent on PM, so if you want ACPI, you end up with all of the STR stuff built in, which is what you don't like (if I understand that correctly). If we have CONFIG_SUSPEND, you'll be able to choose ACPI alone. :-) Good point. Anyway, I think the ACPI problem really is as trivial as the following three-liner removal fix. If the user doesn't want suspend, ACPI shouldn't force it on him. A nicer fix might be to also make some of the ACPI helper routines depend on whether they are needed or not (which in turn will depend on whether suspend support has been compiled into the kernel), but quite frankly, that's secondary at least for me. So if we have a few ACPI routines that will never get called (because we don't even enable the interfaces that would *cause* them to be called), I don't think that's a huge problem. It's a beauty wart, but nobody really cares (and it's even something that we could get the compiler to optimize away for us if we really cared). Linus --- Don't force-enable suspend/hibernate support just for ACPI It's a totally independent decision for the user whether he wants suspend and/or hibernation support, and ACPI shouldn't care. Signed-off-by: Linus Torvalds [EMAIL PROTECTED] --- drivers/acpi/Kconfig |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 251344c..22b401b 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -11,9 +11,6 @@ menuconfig ACPI depends on PCI depends on PM select PNP - # for sleep - select HOTPLUG_CPU if X86 SMP - select SUSPEND_SMP if X86 SMP default y ---help--- Advanced Configuration and Power Interface (ACPI) support for - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PATCH] ACPI patches for 2.6.23-rc1
On Wed, 25 Jul 2007, Len Brown wrote: git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git release Fixes regressions -- a build failure, an oops, some dmesg spam. Also fixes some D-state issues and adds ACPI module auto-loading. Yes, I'd hoped to get the last two in before rc1. I'm hopeful that a couple-days into rc2 is sufficiently early for them. I hate pulling this, but I did. However, what I hate even more after having done so is that ACPI now seems to select CPU hotplug. Why? That is just *broken*. Sure, if you select STR or hibernation, we need CPU hotplug, but just for picking ACPI? Why? Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PATCH] ACPI patches for 2.6.23
On Sun, 22 Jul 2007, Len Brown wrote: please pull from: git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git release Pulled. Btw, on a mostly unrelated note (and to balance the fact that mostly I only speak up to complain), I'd like to say how much I appreciate that not only are your git tree histories now clean and readable and don't have unnecessary merges, but the use of clear topic branches really makes it very pleasant to read and see what changed. I'm certainly hoping the topic branches make your life easier too (ie able to track things by topic rather than having changes mixed up), but even if it doesn't, I wanted to say that I appreciate how it makes it easier for outsiders to follow the big outline of your development tree. So I just wanted to say that I appreciate how it's nice to do gitk ORIG_HEAD.. and see the outline of what has been going on in the ACPI tree after I have pulled. And I'm hoping it actually helps you guys too, but it's appreciated regardless. See? Sometimes I can say nice things too. Linus not _always_ grouchy Torvalds - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] swsusp: Use platform mode by default
On Fri, 11 May 2007, Rafael J. Wysocki wrote: We're working on fixing the breakage, but currently it's difficult, because none of my testboxes has problems with the 'platform' hibernation and I cannot reproduce the reported issues. The rule for anything ACPI-related has been: no regressions. It doesn't matter if something fixes 10 boxes, if it breaks a single one, it's going to get reverted. We had much too much of the two steps forward, one step back dance with ACPI a few years ago, which is the reason that rule got installed (and which is why it's ACPI-only: in some other subsystems we accept the fact that sometimes we don't know how to fix some hardware issue, but the new situation is at least better than the old one). I agree that it can be aggravating to know that you can fix a problem for some people, but then being limited by the fact that it breaks for others. But beign able to *rely* on something that used to work is just too important, and with ACPI, you can never make a good judgement of which way works better (since it really just depends on some random firmware issues that we have zero visibility into). Also, quite often, it may *seem* like something fixes more boxes than it breaks, but it's because people report *breakage* only, and then a few months later it turns out that it's exactly the other way around: now it's a hundred people who report breakage with the *new* code, and the reason people thought it fixed more than it broke was that the people for whom the old code worked fine obviously never reported it! So this is why a single regression is considered more important than ten fixes - because a single regressionr report tends to actually be just the first indication of a lot of people who simply haven't tested the new code yet! People for whom the old code is broken are more likely to test new things. So I'd just suggest changing the default back to PM_DISK_SHUTDOWN (but leave the pm_ops-enter testing in place - ie not reverting the other commits in the series). The patch would look something like this. Coywolf, does this fix it for you? Linus --- kernel/power/disk.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/power/disk.c b/kernel/power/disk.c index b5f0543..f6aa06e 100644 --- a/kernel/power/disk.c +++ b/kernel/power/disk.c @@ -60,9 +60,9 @@ void hibernation_set_ops(struct hibernation_ops *ops) } mutex_lock(pm_mutex); hibernation_ops = ops; - if (ops) - hibernation_mode = HIBERNATION_PLATFORM; - else if (hibernation_mode == HIBERNATION_PLATFORM) + + /* Turn off HIBERNATION_PLATFORM if we no longer have any platform ops */ + if (!ops hibernation_mode == HIBERNATION_PLATFORM) hibernation_mode = HIBERNATION_SHUTDOWN; mutex_unlock(pm_mutex); - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] swsusp: Use platform mode by default
On Fri, 11 May 2007, Rafael J. Wysocki wrote: Just to clarify, the change in question isn't new. It was introduced by the commit 9185cfa92507d07ac787bc73d06c4eec7239 before 2.6.20, at Seife's request and with Pavel's acceptance. Ok, if it's that old, we migt as leave it in. Clearly there weren't many regressions, and this isn't a case of other monsters lurking behind a lack of testers. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PATCH] ACPI patches for 2.6.22 - part 2
On Thu, 10 May 2007, Linus Torvalds wrote: Seems to work for me. My evo correctly started the fan, and stopped it when the temperature went down again. Looking at things in top, I do end up occasionally seeing spikes where kacpid takes 17% of CPU time, and kacpi_notify takes a few percent too. But the machine works ok, and it doesn't seem to be horrible: 64 ?S 0:15 [kacpid] 65 ?S 0:08 [kacpi_notify] so they've gotten 23 seconds of CPU time over the 37 minutes that laptop has been up now. That's arguably too much, but on the other hand, I did end up trying to stress it out by doing some 3D stuff while compiling the kernel and doing git grep over the kernel tree etc. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Add suspend/resume for HPET
On Sat, 31 Mar 2007, Thomas Gleixner wrote: While I agree in principle with the patch, I'm a bit uncomfortable. The sys device suspend / resume ordering is not guaranteed and relies on the registering order. Well, this is why we probably should try to get away from the system device issue, exactly because system devices are totally outside the normal ordering and only have a random linear order. If the clocksources were actually in the device tree, you'd get all the normal guarantees about hierarchical ordering.. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Add suspend/resume for HPET
On Sat, 31 Mar 2007, Thomas Gleixner wrote: Right, but clock - sources/events need to be extremly late suspended and early resumed. How can we ensure this ? Make them be at the top of the device tree by adding them early. That's the whole point of the device tree after all - we have an ordering that is enforced by its topology, and that we sort by when things were added. Right now the way things work is (iirc - somebody like Greg should double-check me) is that we add all devices to the power list at device_add() time by traversing the devices fromt he root all the way out, and doing a device_add() which does a device_pm_add(), which in turn adds it to the power-management list - so that the list is always topologically sorted. So the only thing that needs to be done is to make sure that we add the timer devices early during bootup - something we have to do *anyway*. If a device is added early in bootup, that automatically means that it will be suspended late, and resumed early - because we maintain that order all the way through.. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Add suspend/resume for HPET
On Sat, 31 Mar 2007, Maxim Levitsky wrote: So maybe I was right afrer all, Maybe it is better to add a suspend/resume hook to each clock source and call it from timekeeping_resume() ? Umm.. WHy not make the device tree look like this: -- clocksource -- +-- HPET | +-- TSC | +-- i8259 | +-- lapic timer | .. whatever else and use the struct device that we *have* for this? The whole struct device is literally designed to do this, and to be embedded into whatever bigger structures you have that describes higher-level behaviour. Ie you'd put a struct device inside the struct clocksource. That thingalready *has* the suspend/resume hooks, and it will mean that people will see the clocks in the device tree rather than have a new notion. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ PATCH] Add suspend/resume for HPET was: Re: [3/6] 2.6.21-rc4: known regressions
On Thu, 29 Mar 2007, Maxim wrote: On Thursday 29 March 2007 07:08:58 Linus Torvalds wrote: (Or, better yet, shouldn't we set boot_hpet_disable when we decide not to use the HPET, and set hpet_virt_address to NULL?) This is done here out_nohpet: iounmap(hpet_virt_address); hpet_virt_address = NULL; No, that only clears hpet_virt_address, and thus makes all subsequent hpet_readl() and hpet_writel() calls oops. But it doesn't actually *tell* anybody that the HPET is disabled, so if you later on do if (is_hpet_capable()) { time = hpet_readl(..); .. you will just Oops! So as far as I can see, even with your latest patch, if hpet_enable() fails (and triggers the goto out_nohpet cases), you'll just oops immediately when you try to suspend/resume the HPET. THAT was what I meant - when we clear hpet_virt_address, we should also tell all potential subsequent users that the HPET is not there! Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Add suspend/resume for HPET
On Thu, 29 Mar 2007, Maxim Levitsky wrote: Subject: Add suspend/resume for HPET This adds support of suspend/resume on i386 for HPET Signed-off-by: Maxim Levitsky [EMAIL PROTECTED] --- arch/i386/kernel/hpet.c | 68 +++ Btw, what about arch/x86_64/kernel/hpet.c? That thing seems totally broken. Lookie here: arch/x86_64/kernel/hpet.c:irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs) drivers/char/rtc.c:extern irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id); anybody see a problem? The x86-64 version doesn't seem to be very well maintained. Is there some fundamental reason why this file isn't shared across architectures? Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ PATCH] Add suspend/resume for HPET was: Re: [3/6] 2.6.21-rc4: known regressions
On Thu, 29 Mar 2007, Maxim Levitsky wrote: I agree, and as you said I did exactly that: Gaah. I'm blind. Sorry. Your patch did indeed do exactly that, I somehow overlooked it. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/6] 2.6.21-rc4: known regressions
On Wed, 28 Mar 2007, Maxim wrote: Now I don't have a clue how to set those bits if only HPET is used as clock source because now clocksources don't have _any_ resume hook. One thing that drives me wild about that clocksource resume thing is that it seems to think that clocksources are somehow different from any other system devices.. Why isn't the HPET considered a device, and has it's own *device* suspend and resume? Why do we seem to think that only set_mode() etc should wake up clock sources? It's a *device*, dammit. It should save and resume like one (probably as a system device). The set_mode() etc stuff is at a completely different (higher) conceptual level. Thomas? It does seem like Maxim has hit the nail on the head (at least partly) on the HPET timer resume problems.. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ PATCH] Add suspend/resume for HPET was: Re: [3/6] 2.6.21-rc4: known regressions
On Thu, 29 Mar 2007, Maxim wrote: I am sending here a patch that as was discussed here adds hpet to list of system devices and adds suspend/resume hooks this way. I tested it and it works fine. Ok, it certainly looks better, but it *also* looks like it just assumes the HPET is there. Which would work in testing _with_ a HPET, but would likely break on hardware without one, no? Shouldn't there be at least something like a if (!is_hpet_capable()) return 0; at the top of that init routine? I'd also expect that you'd need to check that hpet_virt_address is valid or something? (Or, better yet, shouldn't we set boot_hpet_disable when we decide not to use the HPET, and set hpet_virt_address to NULL?) Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/6] 2.6.21-rc2: known regressions
On Fri, 9 Mar 2007, Ingo Molnar wrote: Some day we may have modesetting support in the kernel for some graphics hw, right now it's pretty damn spotty. having no video is what i'd have expected - but getting a /hang/ is not what i'd have expected. I debugged a case exactly like this (with the TRACE_RESUME() thing) back last November on my laptop. The problem is that radeonfb actually *tries* to re-initialize the graphics chips, but because it doesn't know all the details, it does it wrong, and hangs the whole chip. I forget exactly where it happened, and it might well be different for you. But this is exactly what TRACE_RESUME can be used to pinpoint if you want to try to debug it. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/6] 2.6.21-rc2: known regressions
[ Eric, Ingo, can you double-check the timer initialization after resume? We appear to have several reports of date not advancing, and while this could be some SATA issue, it could easily be a timer tick issue too ] On Thu, 8 Mar 2007, Michael S. Tsirkin wrote: Here's the status with -rc3: better, but still does not work as well as 2.6.20. Ok. I think we mostly solved the irq-related stuff, but you might want to check whether you have CONFIG_NOHZ on or off and whether that makes a difference. 2. First disk access after resume takes a couple of minutes (seemed instant with 2.6.20) during this time no new messages show on console Yeah, there is some problem with SATA resume. It would be beautiful if the people who actually see this could narrow it down with bisection. It works for me is clearly the case for many people, but not all. But before blaming SATA, check if you have NO_HZ enabled and whether disabling that makes it work ok. If timeouts don't work right (or are *extremely* slow) things that should be instant won't be. 3. When I switch to X (CTRL-ALT-F7), X hangs after drawing a couple of windows after waiting for some 10 min, I rebooted. no new messages showed up in /var/log/messages I think this is likely just more of the disk being buggered, but it could again be related to NO_HZ (people report time not advancing, and that would make any X timeout taking forever, and you'd see exactly your behaviour). Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/6] 2.6.21-rc2: known regressions
On Fri, 9 Mar 2007, Ingo Molnar wrote: disabling the following radeonfb options in the .config made resume work again: In general, don't even *try* to use radeonfb for suspend/resume. I don't think it has ever worked, except on some very rare laptops (largely PPC Macs) where people had enough information to set up the PLL's. I don't think the other framebuffer drivers are much better. You're better off using the VGA console, and lettign X re-initialize the graphics device. That generally at least has a reasonably good chance of working. Re-initializing graphics modes really is very hard. You can try with the BIOS video hack (I forget the kernel command line to turn it on), but we really do end up depending on X doing it better. Some day we may have modesetting support in the kernel for some graphics hw, right now it's pretty damn spotty. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.21-rc2 regression vs. 2.6.20: AT keyboard only works with pci=noacpi
On Wed, 7 Mar 2007, Ash Milsted wrote: So, I tracked this down to 2.6.21-git7, the first snapshot that gives me this problem. Hmm. There is no 2.6.21-git7 (that would be the seventh nightly snapshot after 2.6.21 is released, which hasn't happened yet!). Do you mean that it happens between 2.6.20-git6 and 2.6.20-git7? That would be git commits (the way to get them is to look at the *.id file that is associated with a snapshot): 66efc5a7e3061c3597ac43a8bb1026488d57e66b -git6 509cb37e173d4e39cec47238397e91b718730794 -git7 and yes, doing a gitk 66efc5a7..509cb37e does show an input merge. Tellingly it does contain an input tree merge. I would git bisect but I don't have a local copy of the tree - I tried to get one, but it stopped halfway through the clone, probably because I had to use http... So, I hope that helps. Can you try rsync? It's not a great protocol in general, but it's perfectly fine for an initial clone.. After that, since you have already narrowed it down to a particular nightly snapshot, you could do the bisection startign from the known commits already: git bisect start git bisect good 66efc5a7# 2.6.20-git6 was good git bisect bad 509cb37e # 2.6.20-git7 was bad and you'll have less than 500 commits to test (which is quite fast to bisect). If you want to do some manual checking first (ie guessing that the bad behaviour came from that particular input merge), you could first try out commit 2a598df5, which is the head commit before of the merged input tree (this is all trivial to see with the above gitk - the SHA1's may sound scary and esoteric, but they're really easy to look up). That manual check (*if* it turns out that 2a598df5 is indeed the bad one) would cut down the range from good to bad to just 18 commits (the range would be 66efc5a7..2a598df5), and then you should be able to pinpoint the exact bad one from just a few reboots.. (If you have to bisect all 500 commits, it would be ~10 reboots rather than four or five). Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.21-rc2 regression vs. 2.6.20: AT keyboard only works with pci=noacpi
On Wed, 7 Mar 2007, Ash Milsted wrote: Anyway, here's the bootlog for Dmitry from a boot with broken keyboard (2.6.21-rc3): The non-working setup doesn't get any interrupts back, and thus doesn't see the ACK for the \xd4\xed command. It really looks interrupt-related (especially considering that it goes away when you ask ACPI to not do certain things), but at the same time, the differences between -git6 and -git7 really don't seem to have *any* ACPI or PCI irq routing changes, so I think this really is related to the input-layer, and perhaps the real difference between ACPI irq routing and not is just the timing or IO acecss patterns that you get when you use the local apic vs the i8259 legacy irq controller. For example, if there is a edge-triggered interrupt involved (and both keyboard *and* mouse are edge-triggered), the io-apic and the i8259 work differently: temporarily disabling the interrupt will reset the edge trigger logic on the i8259, but not on an IO-APIC. So the lack of interrupts could be due to the input layer not clearing the interrupt source during setup, so some *old* interrupt just stays around, and because it's always set, on an IO-APIC it will never show as an edge at all - but on the i8259 the very action of registering the irq routine will create an edge. There's some reason to believe that you may have a pending interrupt waiting: for the working case, you actually have: Mar 7 23:20:41 joker atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access hardware directly. so there was certainly *something* unexpected there. I'm not saying that's it, but it could explain why something that looks interrupt-related and that changes depending on whether you use ACPI to set up interrupts or not can have these kinds of reasons, that just depend on which interrupt controller the kernel happens to use, even though it's not really about lost interrupts at all, but just a driver that doesn't acknowledge a pending one. Or something. Doing the git bisect would really help. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PATCH] ACPI patches for 2.6.20-rc1
On Wed, 20 Dec 2006, Len Brown wrote: please pull from: Is this really all obvious bug-fixes? There seems to be a lot of development there that simply isn't appropriate after an -rc1 any more. I want 2.6.20 to be stable, and one of the things I'm doing is to be strict about the merge window. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC] ACPI vs device ordering on resume
On Fri, 1 Dec 2006, Pavel Machek wrote: So it looks like we need this sequence: enable_nonboot_cpus() /* INIT */ finish() /* _WAK */ device_resume() Can somebody remind me about this immediately after 2.6.19? Remind. But note that freezer is not yet SMP safe... Rafael is working on that. Thanks. On the other hand, I really wonder (and suspect) whether the problem isn't really the freezer or even the kernel resume ordering, but simply an ACPI internal resume ordering thing. Doesn't ACPI have per-device WAK calls anyway? Shouldn't we just call those _individually_ as we walk the device tree (perhaps in the early_resume stage) rather than calling them all in one chunk? Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
On Sat, 18 Nov 2006, Starikovskiy, Alexey Y wrote: May because it does not have a single common line with the previous patch? Yeah, I do agree that it _looks_ very different as a patch, but it ends up having all the same execution profiles.. It's been too long since I debugged the previous problem, so I don't remember the exact details any more (back then I enabled ACPI debugging and watched the messages scroll by etc - this time I initially thought it was interrupt-related due to the other irq problems we've had, so I started bisecting immediately _without_ doing any ACPI debugging stuff, and by the time I actually bisected down enough, I recognized the problem, so I didn't do all the same enable ACPI messages and look deeply into what is going on thing). But if I remember correctly, what happens is _roughly_ something like this: - thermal event happens - the CPU is getting warm, and the fan needs to start up. Quite often, this happened early during boot (which is quite busy - some init scripts are disgustingly CPU-intensive mainly due to using inefficient scripting languages), but if it didn't happen there, it's easy enough to force to happen other ways. - part of the handling is acpi_os_execute() for something (don't ask me what), but the interestign thing is how that acpi_os_execue() then ends up causing a _recursive_ event. - we handle the original event in kacpid, and hand over the new one as a notification event. But the event keeps on happening, and kacpid keeps on running, and the other thread doesn't actually ever _run_ because kacpid holds he ACPI lock and is constantly busy. - we not only are constantly running in kernel space, we also end up eventually running out of memory for allocating all the work queue entries. So the reason the old code works is because everything is done in a single thread, and yes, we end up getting multiple events, but because the queue is all done onto the same queue that is _handling_ the events in the first place, and because it's a FIFO queue, the notification events get handled _before_ the later events. So with the single-threaded situation, you basically end up always doing the events in the same order they came in. In the two separate threads case, you don't, and one thread will end up generating events forever, waiting for them to happen, but they never _do_ happen, so you have a lockup _and_ eventually an infinite event queue for the other thread. Or may be because it fixes all the current AMD-HP notebooks? Or may be because it did not fail while being in -mm? I'm afraid that -mm doesn't get as much testing as it used to get. Also, I do realize that the patch fixes other problems, but we have long had a very strict policy that we do NOT accept regressions. Immediately when you start accepting regressions, you will never know whether you're going forward of backwards. It's better to have a known _old_ bug than to introduce a new one. So the no regressions! rule ends up trumping pretty much every single other issue. It's unacceptable to have machines that used to work, suddenly stop working. Even if it fixes another machine. ACPI didn't use to have that rule, and it was wild and crazy. Maybe more bugs got fixed, but the problem with accepting regressions is that nobody can _ever_ trust that system. You do not want to have people _afraid_ of upgrading - they should feel confident that upgrading never introduces any new problems. (Of course, that can never be reached 100%, but it's very much part of the goal. It kind of falls into the same backwards compatibility on interfaces absolute goal: it's ok to do new things, but you can never allow them to break old programs) I will not sneak it in again, I promise. Feel free to send me test patches when working on these things, because I have no trouble at all to test my particular machine. I think you'll find the ACPI dumps etc for that machine in your archives, because I've sent them to Len and the acpi lists several times, but if you want to get AML disassemblies etc, just tell me how. I've done them before, but I work on this seldom enough that I always forget what the magic incantations are, and where to get the tools etc. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
On Sat, 18 Nov 2006, Starikovskiy, Alexey Y wrote: I've sent you a test patch back in July, but did not get a reply. May be due to OLS? Heh. Whenever you send me something like that, and I don't answer within a few days, you can pretty much depend on me not answering - my mailqueue just fills up too fast. And yeah, it might have been during OLS. Just re-send when it happens. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
On Sat, 18 Nov 2006, David Brownell wrote: Running right now with a patch reverting the update which made trouble on Linus' machine, but without Alexey's two tweaks to the EC interrupt handler. So far so good, even after doing things which had previously caused AE_TIME errors pretty quickly. But then, the errors weren't what I'd call reproducible either. Ok, goodie. Adrian, that means that there's one less regression on your list, unless David reports that he can reproduce it again (I don't think he will be able to: all the other ACPI changes looked relatively harmless, at least in the particular area of ACPI changes I looked at) Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
On Fri, 17 Nov 2006, Adrian Bunk wrote: Subject: nasty ACPI regression, AE_TIME errors References : http://lkml.org/lkml/2006/11/15/12 Submitter : David Brownell [EMAIL PROTECTED] Handled-By : Len Brown [EMAIL PROTECTED] Alexey Starikovskiy [EMAIL PROTECTED] Status : problem is being debugged I do not know if this is related, but testing one of my laptops (always a good idea to check the week before release) shows that my trusty old Compaq N620c locks up rather quickly at boot with the current -git tree. Total lockup - no sysrq, no messages, no nothing. I've mostly bisected it (what the _hell_ did we do before git bisect?), and right now I know: commit 9aaed2b42d00d4abb2748d72d599a8033600e2bf is bad (that's Len's pull trivial into test branch) commit. v2.6.19-rc2 seems all good. Which leaves a chunk of just a few ACPI commits left to bisect. I'll do five or so more reboots, and I should be able to tell exactly which commit breaks. It almost always locks up very early during boot (generally during the initializing udev phase), although sometimes it survives a bit further.. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
On Fri, 17 Nov 2006, Linus Torvalds wrote: Total lockup - no sysrq, no messages, no nothing. Dammit. It looks like 37605a6900f6b4d886d995751fcfeef88c4e462c, and I should have realized that immediately. That commit re-introduces the bug that we already reverted once. Why the hell did that idiotic thing go in, when we had to revert it once already (see commit 72945b2b90a5554975b8f72673ab7139d232a121 for the earlier revert). It was broken then, it is broken now. Nothing has changed. Why did you guys try to sneak it in again? Last time this same use a second workqueue patch went in (in a different form), we had _exactly_ the same problems, with total lockups, and way too high CPU usage. The bugzilla entry that you refer to in that commit is even the same one that discussed why the _original_ patch was totally broken. It's even the same AUTHOR who wrote the original buggy patch, that pushed through the same buggy patch AGAIN. Dammit, this is frustrating. Why did people expect it to suddenly not be buggy? Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC] ACPI vs device ordering on resume
On Wed, 15 Nov 2006, Len Brown wrote: So it looks like we need this sequence: enable_nonboot_cpus() /* INIT */ finish() /* _WAK */ device_resume() Can somebody remind me about this immediately after 2.6.19? No way am I going to make that kind of a major ordering change right now, especially as the thing isn't apparently even a real regression (just more fallout from enabling MSI). It would be good if people who are affected (and people in general that are interested in power management) would test out the patch. In particular, this is an even bigger change than the one suggested by Stephen. It means, for example, that we will have device resume (and process thawing) being called when we're in SMP mode - and that may or may not have issues. So this is a really scary patch to me, no way in hell will I apply it right now. But if people try it out, it would be good.. Btw, this is a clear example of where it might be good to start actively using the early_resume thing, and do PCI bus resume there. We might want to do early_resume _before_ calling the bios (I'm not at all convinced that the firmware will do the right thing without the PCI buses set up, for example), and _before_ thawing processes. Then, we might let individual devices do their device resume in the normal late resume phase. Greg already carries the patch around for that PCI bus early resume thing, I think. We need to test these things. Linus diff --git a/kernel/power/main.c b/kernel/power/main.c index 873228c..2989609 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -132,12 +132,12 @@ int suspend_enter(suspend_state_t state) static void suspend_finish(suspend_state_t state) { - device_resume(); - resume_console(); - thaw_processes(); enable_nonboot_cpus(); if (pm_ops pm_ops-finish) pm_ops-finish(state); + device_resume(); + resume_console(); + thaw_processes(); pm_restore_console(); } - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ACPI vs device ordering on resume
On Tue, 14 Nov 2006, Stephen Hemminger wrote: During the pci_restore process, the MSI information and the PCI command register are restored properly. But later during resume, inside the ACPI evaluation of the WAK method, the PCI_COMMAND INTX_DISABLE (0x400) flag is being cleared. My guess is that the BIOS ends up doing some resetting of devices. Hmm.. I think calling pm_ops-finish() early is the right thing to do, so your patch looks fine. I'd hate to apply it at this stage in the 2.6.19 development, though. Comments? Looking at the code (at least for ACPI), pm_ops-finish() function does things that we'd generally want done early. I'm a bit nervous about the nesting, though - we call the -prepare function _before_ we do the device suspend stuff, so if -finish undoes something that was done by -prepare, then we will restore the devices with the state they were in _after_ the -prepare. So from a logical _nesting_ perspective, the -finish() routine should happen after devices have been restored. And anything that ACPI does to undo -enter should have been done early by ACPI itself as it was exiting that -enter routine - that would make the things nest properly. Gaah. Since there is no way to know what the HELL those ACPI routines will actually end up doing, there is no way to know what is the right answer. Does anybody know what Windows does? I think Stephen's patch is likely good, but I really don't see myself applying it to my tree right now. It should either go into -mm, or early into the post-2.6.19 tree (or, most likely, both). Anybody? Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.19-rc - ThinkPads
On Wed, 1 Nov 2006, Andi Kleen wrote: Ok please revert the i386 patch for now then if it fixes the ThinkPads. The x86-64 version should be probably fixed too, but doesn't cleanly. I will send you later a patch to fix this there properly. Actually, I should have just fixed the ordering. I did some cleanups too, but those are unrelated (except in the sense that I wanted to look at the assembly code, and the cleanups made the code generation at least half-way sane!) I've pushed out the changes, but here is the part that may or may not matter for anybody who wants to test it if they don't use git or if it hasn't mirrored out yet. Michael? Martin? Andi: I think the patches should work pretty much as-is for x86-64 too, since all the issues would seem to be similar. I'm not entirely happy with ioapic_write_entry() now either (if we change an entry that was already unmasked, we should probably mask it first by writing the low word with the mask bit set, then write the high word, and then write the low word again), but - this makes us match the ordering we _used_ to have, so if the cleanup broke things for people, this should unbreak it, and at least not be any worse than it used to be. - when we write new unmasked entries, they all _should_ have been masked before, so hopefully the change a unmasked entry while it's unmasked case doesn't actually ever happen. But I didn't actually _check_. Somebody should look into that case. Does anybody feel like they want to learn more about the IO-APIC? Halloween is over and gone, but if you want to scare small children _next_ year, telling them about the IO-APIC is likely a good strategy. Linus --- commit f9dadfa71bc594df09044da61d1c72701121d802 Author: Linus Torvalds [EMAIL PROTECTED] Date: Wed Nov 1 10:05:35 2006 -0800 i386: write IO APIC irq routing entries in correct order Since the mask bit is in the low word, when we write a new entry, we need to write the high word first, before we potentially unmask it. The exception is when we actually want to mask the interrupt, in which case we want to write the low word first to make sure that the high word doesn't change while the interrupt routing is still active. Signed-off-by: Linus Torvalds [EMAIL PROTECTED] diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c index eb10bd5..507983c 100644 --- a/arch/i386/kernel/io_apic.c +++ b/arch/i386/kernel/io_apic.c @@ -147,12 +147,34 @@ static struct IO_APIC_route_entry ioapic return eu.entry; } +/* + * When we write a new IO APIC routing entry, we need to write the high + * word first! If the mask bit in the low word is clear, we will enable + * the interrupt, and we need to make sure the entry is fully populated + * before that happens. + */ static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e) { unsigned long flags; union entry_union eu; eu.entry = e; spin_lock_irqsave(ioapic_lock, flags); + io_apic_write(apic, 0x11 + 2*pin, eu.w2); + io_apic_write(apic, 0x10 + 2*pin, eu.w1); + spin_unlock_irqrestore(ioapic_lock, flags); +} + +/* + * When we mask an IO APIC routing entry, we need to write the low + * word first, in order to set the mask bit before we change the + * high bits! + */ +static void ioapic_mask_entry(int apic, int pin) +{ + unsigned long flags; + union entry_union eu = { .entry.mask = 1 }; + + spin_lock_irqsave(ioapic_lock, flags); io_apic_write(apic, 0x10 + 2*pin, eu.w1); io_apic_write(apic, 0x11 + 2*pin, eu.w2); spin_unlock_irqrestore(ioapic_lock, flags); @@ -274,9 +296,7 @@ static void clear_IO_APIC_pin(unsigned i /* * Disable it in the IO-APIC irq-routing table: */ - memset(entry, 0, sizeof(entry)); - entry.mask = 1; - ioapic_write_entry(apic, pin, entry); + ioapic_mask_entry(apic, pin); } static void clear_IO_APIC (void) - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.19-rc - ThinkPads
On Wed, 1 Nov 2006, Michael S. Tsirkin wrote: I've pushed out the changes, but here is the part that may or may not matter for anybody who wants to test it if they don't use git or if it hasn't mirrored out yet. Michael? Martin? I pulled the latest git, and seems to work for me, thanks. This still could be a false negative (happened already) so I'll continue using this, and will post the results. Ok, thanks. Somebody should look into that case. Does anybody feel like they want to learn more about the IO-APIC? Halloween is over and gone, but if you want to scare small children _next_ year, telling them about the IO-APIC is likely a good strategy. Hmm, sounds interesting :) Is this a good place to start (I'm feeling lucky hit for IO-APIC)? http://www.intel.com/design/chipsets/datashts/290566.htm Yeah, that's the datasheet. Note that a lot of the IO-APIC complexity is not so much in the programming interfaces themselves, but in keeping track of how the heck the thing is connected (ie ExtINT vs SCI vs normal apic interrupt etc). Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.19-rc - ThinkPads
On Wed, 1 Nov 2006, Ernst Herzberg wrote: still bisecting, will report the result. Figuring out what caused an apparent change of behaviour is definitely a good idea - it might give us some clue to what really is going on. (Or it might not. Sometimes the patch that triggers changes really doesn't seem to have anything to do with anything, and it literally was just a latent bug that just happened to be exposed by something that had nothing to do with anything at all but perhaps timing. But that's pretty rare in the end. It happens, but it's definitely not the common case at all, and I think it's great that you're bisecting even if there is a possibility that we'll be left with a big Huh? Whaa? as the end result ;^) Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.19-rc - ThinkPads
On Wed, 1 Nov 2006, Michael S. Tsirkin wrote: I've been bisecting ACPI/suspend thinkpad issue myself and I seem to get eea0e11c1f0d6ef89e64182b2f1223a4ca2b74a2 good, cf4c6a2f27f5db810b69dcb1da7f194489e8ff88 bad. Very interesting.. That commit cf4c6a2f on the face of it looks like an obvious cleanup, but looking closer, it actually changes the order of the apic writes in many cases (high word first - low word first). It also does something else that looks really really wrong: it turns an atomic update ioapic and set irq-info into two separate events, where interrupts can happen in between. Same goes for resume (instead of atomically changing all entries with the ioapic lock held, it now does them individually, and locks them individually). I wonder if the order matters more, though. Andi? We _used_ to write the high word first, and I think the order matters. The low word contains the enable bit, for example, so when enabling an interrupt, you should write the low word last, when you disable it you should write the low word first. And that's exactly what we _used_ to do, and it's what that particular commit totally and utterly _broke_. I think that commit should either be reverted, or the code should be fixed to do the writes in the proper order. I suspect reverting it is the right thing to do - the patch only introduces bugs, an doesn't actually _fix_ anything, it just cleans things up. Andi, you need to be a hell of a lot more careful! Apparently x86-64 is also totally broken in this regard, because somebody didn't realize that the ordering _matters_. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.19-rc3: known unfixed regressions (v3)
On Mon, 30 Oct 2006, Jun'ichi Nomura wrote: Please revert the patch. I'll fix the wrong error handling. I'm not sure reverting the patch solves the ACPI problem because Michael's kernel seems not having any user of bd_claim_by_kobject. Yeah, doing a grep does seem to imply that there is no way that those changes could matter. Michael, can you double-check? I think Jun'ichi is right - in your kernel, according to the config posted on bugzilla, I don't think there should be a single caller of bd_claim_by_disk, since CONFIG_MD is disabled. So it does seem strange. But if you bisected to that patch, and it reliably does _not_ have problems with the patch reverted, maybe there is some strange preprocessor thing that makes grep not find the caller. Michael, you also reported: Reset to d7dd8fd9557840162b724a8ac1366dd78a12dff seems to hide part of the issue (I have ACPI after kernel build, but not after suspend/resume). Both reverting this patch, and reset to the parent of this patch seem to solve (or at least, hide) both problems for me (no ACPI after suspend/resume and no ACPI after kernel build). (where that d7dd8f.. is actually missing the initial 4 - I think you cut-and-pasted things incorrectly). So I wonder.. You still had ACPI working _after_ the kernel build even with that patch in place, and it seems that suspend/resume is the real issue. Martin Lorenz reports on the same bugzilla entry, and he only has problems with suspend/resume. I assume that compile the kernel just triggers some magic ACPI event (probably fan-related due to heat), and I wonder if the bisection faked you out because once you get close enough the differences are small enough that the kernel compile is quick and the heat event doesn't actually trigger? See what I'm saying? Maybe the act of bisecting itself changed the results, and then when you just revert the patch, you end up in the same situation: you only recompile a small part (you only recompile that particular file), and the problem doesn't occur, so you'd think that the revert fixed it. If it's heat-related, it should probably trigger by anything that does a lot of CPU (and perhaps disk) accesses, not just kernel builds. It might be good to try to find another test-case for it than a kernel recompile, one that doesn't depend on how much changed in the kernel.. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Oops on boot (probably ACPI related)
On Wed, 27 Sep 2006, Linus Torvalds wrote: On Wed, 27 Sep 2006, Andi Kleen wrote: I expect this patch to fix it. Andrew, Kyle, can you verify? Not that it really matters. Andi sure as hell pinpointed a real problem with the new and broken inline asm. That's almost certainly the bug that crept in during the recent rewrite. HOWEVER, now that I look more closely at the rewrite, I'm really wondering whether the rewrite was worth it at all. It generates smaller code, but at the expense of - the actual cache-footprint is bigger - the branch will now be mis-predicted by default Since the smaller code really only tends to matter from a cache usage standpoint, I don't know if I'm at all convinced. The fact that rewinders have problems is fairly immaterial. Maybe we should just take this as a hint that all the stupid rewinding code was wrong in the first place, and we should stop doing that? We can go back to just printing out our stacktrace guesses, that has worked for us for a long time, and the stack unwinding simply looks _fundamentally_ flawed. So I have a real urge to just revert that change anyway. Are there any _real_ advantages to this broken unwinding code that has had more bugs that Windows XP? Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Oops on boot (probably ACPI related)
On Wed, 27 Sep 2006, Andi Kleen wrote: It doesn't matter much because these days this stuff is all out of lined anyways and in a single function. And the dynamic branch predictor in all modern CPUs will usually cache the decision (unlocked) there. Ahh, good point. Once there's only one copy, the branch predictor will get it right (and the code size won't much matter) Are there any _real_ advantages to this broken unwinding code that has had more bugs that Windows XP? I thought for a long time we didn't need it either, but these days with all these callbacks in some parts of the kernel (driver model, others) and you get a oops with 60+ entries it is just too much trouble to figure it out manually. Ok, fair enough. I'll apply your fix (which in itself is obviously correct). I just wanted to bring up the possibility that we should just remove the (fragile) unwinder. But let's leave it for another day, if it keeps being problematic. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: kacpi_notify?
On Thu, 13 Jul 2006, Starikovskiy, Alexey Y wrote: I'm terribly sorry that my patch broke on your machine. May I ask you to send me or attach to #5534 output of acpidump from this machine? I'll send it in another email, since I already generated it for Len ;) Do you think that the whole idea is crap, or if I limit number of possible spawned threads and forsibly put current thread to sleep (which will release ACPICA executer mutex), as it happens in DSDT of nx6125 it will be possible to use it? I don't think the _idea_ is crap per se, but it would at a minimum need a thread limit. But I think it's the wrong approach: especially if you put the current thread to sleep, you really don't want another thread at all, you are really just working around a problem that is totally internal to acpi (and the AML interpreter in particular). So I think the problem really lies elsewhere, and that the whole thread approach was trying to paper over it. And having a limited set of threads is probably potentially _worse_ then what we have now. Is there no way to have the AML interpreter have some state, and just push that current interrupted state back onto the event queue, and just start executing the new one instead? That sounds like it should fix the _real_ problem - a kind of mini-scheduler for ACPI events? Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
kacpi_notify?
Hmm. What's up with this? 2341 ?D 0:00 [kacpid_notify] 2342 ?D 0:00 [kacpid_notify] 2343 ?D 0:00 [kacpid_notify] 2344 ?D 0:00 [kacpid_notify] 2345 ?D 0:00 [kacpid_notify] 2346 ?D 0:00 [kacpid_notify] 2347 ?D 0:00 [kacpid_notify] ... (apparently about 300 of those processes, at which point the machine just hangs, because even root cannot start any new processes, and I couldn't actually get to debug this at all). What would it be waiting on, and why? This machine doesn't have any module support (at all), and I haven't booted a new kernel on it in quite a while, so this isn't necessarily new behaviour, but the last kernel I tried (which did _not_ have this problem, obviously) was in April (commit 6e5882cfa24e1456702e463f6920fc0ca3c3d2b8, to be exact). Now, that's 6000+ commits ago, so I'd rather not even bisect this, if somebody can come up with a more obvious explanation of why kacpid_notify would be started over and over and over again, only to always get stuck.. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kacpi_notify?
On Wed, 12 Jul 2006, Linus Torvalds wrote: (apparently about 300 of those processes, at which point the machine just hangs, because even root cannot start any new processes, and I couldn't actually get to debug this at all). With ACPI debugging, I notice that it finally dies due to ACPI Error AE_NO_MEMORY. Which I guess is just due to thousands of kacpi_notify processes, and tons of allocations. With ctrl+scrolllock, I finally got something. The traceback for the D-state (millions and millions of them) is __down_failed acpi_ut_acquire_mutex acpi_ex_enter_interpreter acpi_ns_evaluate acpi_evaluate_object acpi_evaluate_integer acpi_os_execute_thread acpi_thermal_get_temperature acpi_thermal_check .. and 'kacpid' seems to be stuck using all CPU time, with the thing doing something like: EIP is at delay_tsc+0xb/0x13 EFLAGS: 0283Not tainted (2.6.18-rc1-g155dbfd8 #24) EAX: 4aa48900 EBX: 00026be1 ECX: 4aa40b7e EDX: 001a ESI: EDI: c039300d EBP: c0390df3 DS: 007b ES: 007b CR0: 8005003b CR2: 080516f0 CR3: 362dc000 CR4: 06d0 [c01c94c0] __delay+0x6/0x7 [c01f23ef] acpi_os_stall+0x1d/0x29 [c0201f11] acpi_ex_system_do_stall+0x37/0x3b [c0200fca] acpi_ex_opcode_1A_0T_0R+0x85/0xc8 [c01f5308] acpi_ds_exec_end_op+0x133/0x553 [c020d0f3] acpi_ps_parse_loop+0x777/0xbe0 [c020c488] acpi_ps_parse_aml+0xd8/0x2d5 [c020dbbe] acpi_ps_execute_pass+0xa9/0xd2 [c020dd6a] acpi_ps_execute_method+0x153/0x231 [c02095e1] acpi_ns_evaluate+0x179/0x24c [c01fc12e] acpi_ev_asynch_execute_gpe_method+0xeb/0x159 [c01f2083] acpi_os_execute_deferred+0x19/0x21 [c01226a0] run_workqueue+0x68/0x95 [c01f206a] acpi_os_execute_deferred+0x0/0x21 [c0122b2e] worker_thread+0xf9/0x12b [c03570bf] schedule+0x469/0x4cc [c0113bfb] default_wake_function+0x0/0xc [c0122a35] worker_thread+0x0/0x12b [c01249bb] kthread+0xad/0xd8 [c012490e] kthread+0x0/0xd8 [c0101005] kernel_thread_helper+0x5/0xb which I assume is the thing that holds the AML semaphore, and isn't releasing it. Is there any sane debugging info I can send people? Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: kacpi_notify?
On Wed, 12 Jul 2006, Linus Torvalds wrote: I've got a hundred-odd commits to go, but the next bisection test happens to be the parent of my merge (your merge linus into release branch merge: ae6c859b7dcd708efadf1c76279c33db213e3506), so if I'm right, I'd expect that to be a bad tree. Yup. And yes, the problem seems to co-incide with getting about 300 acpi interrupts per second. After about 9500 interrupts (each of which seems to create one of these things), the machine is basically dead. Ten thousand kacpid_notify threads is too much. Regardless of what brought on this bug, I think there's something wrong in anything that keeps on notifying things without keeping track of how many outstanding notifications it already has. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: kacpi_notify?
On Wed, 12 Jul 2006, Brown, Len wrote: Likely related to bugzilla-5534 b8d35192c55fb055792ff0641408eaaec7c88988 Well, that one certainly looks likely. Any reason to not just revert it? The fundamental problems that it introduces are obviously much worse than the fix. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 5534] No thermal events until acpi -t - HP nx6125
On Wed, 12 Jul 2006, Linus Torvalds wrote: Any reason to not just revert it? The fundamental problems that it introduces are obviously much worse than the fix. Ok, that commit b8d35192c55fb055792ff0641408eaaec7c88988 is definitely horribly horribly broken. I'm going to revert it, because the fix is much worse than the problem it fixes. Instead of a fan not coming on, I now have ten thousand threads killing the machine instead - and the fan _still_ doesn't come on.. The thread approach doesn't even fix the fundamental problem itself. It doesn't help to start a new thread, when the AML interpreter holds a semaphore over the sleep, causing the events to be serialized, and the thermal events to be delayed _anyway_. The only thing the threading causes is that it guarantees that the machine ends up being totally overwhelmed by the thousands of threads, all blocked on the same semaphore. I don't know what the solution should be, but in the meantime, the fix is definitely unacceptable. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ACPI_DOCK bug: noone cares
On Sun, 9 Jul 2006, Brown, Len wrote: Fair enough. Reverted. I disagree with this decision, and would like to know what is necessary to reverse it. Mistakes happen. Fair enough. They happen all the time. This time around, for the 2.6.18-rc1 thing, I had heard more than the usual nobody even reacted, as Andrew had held up two patch-series of his because of that issue.. So that makes me like it even less than usual when I'm told that a problem with something I merged was apparently known BEFORE IT WAS MERGED. So Adrian's report on its own wouldn't have caused a revert. If you address me directly when you are asking me to do something, that would really help me help you. As far as I can tell, you were cc'd on all of these things, along with the linux-acpi mailing list. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ACPI_DOCK bug: noone cares
On Sun, 9 Jul 2006, Brown, Len wrote: So I ask you. If I fix the Kconfig issue today, will you accept a push that restores this driver to 2.6.18? Sure. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Resume on Apple Mac Mini
It turns out that the Apple Mac Mini comes back from suspend in legacy mode. The simplest and most obvious fix would seem to be the following trivial one-liner, which just enables ACPI mode after any suspend event. Comments? Can anybody see any downsides to just doing something this obvious, and doing it unconditionally? I've seen a much more complicated patch that only does this for Apple hardware, but I wouldn't be surprised at all if this is more common. In fact, googling for the symptoms (nobody cared and acpi_irq and suspend) gives quite a number of hits, so I would not be surprised at all if this is not Apple-related at all, but that we should always have done this. Comments? Linus --- diff --git a/drivers/acpi/sleep/main.c b/drivers/acpi/sleep/main.c index 56f861e..7b6c146 100644 --- a/drivers/acpi/sleep/main.c +++ b/drivers/acpi/sleep/main.c @@ -109,6 +108,11 @@ static int acpi_pm_enter(suspend_state_t local_irq_restore(flags); printk(KERN_DEBUG Back to C!\n); + /* +* Make sure we're back in ACPI mode! +*/ + acpi_enable(); + /* restore processor state * We should only be here if we're coming back from STR or STD. * And, in the case of the latter, the memory image should have already - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Resume on Apple Mac Mini
On Sat, 17 Jun 2006, Linus Torvalds wrote: It turns out that the Apple Mac Mini comes back from suspend in legacy mode. Actually, that wasn't it. My debugging patches use the RTC to save off information, and that causing a test-kernel to not compile the test, or just normal incompetence on my part, made me confirm this as working, when it isn't. The thing that does work is to literally just set the SCI_ENABLE bit: acpi_set_register(ACPI_BITREG_SCI_ENABLE, 1, ACPI_MTX_DO_NOT_LOCK); after resume, _not_ the acpi_enable(). Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: git pull on Linux/ACPI release tree
On Mon, 9 Jan 2006, Linus Torvalds wrote: One thing we could do is to make it easier to apply a patch to a _non_current_ branch. [ ... ] Do you think that kind of workflow would be more palatable to you? It shouldn't be /that/ hard to make git-apply branch-aware... (It was part of my original plan, but it is more work than just using the working directory, so I never finished the thought). Btw, this is true in a bigger sense: the things git does have largely been driven by user needs. Initially mainly mine, but things like git-rebase were from people who wanted to work as sub-maintainers (eg Junio before he became the head honcho for git itself). But if there are workflow problems, let's try to fix them. The apply patches directly to another branch suggestion may not be sane (maybe it's too confusing to apply a patch and not actually see it in the working tree), but workflow suggestions in general are appreciated. We've made switching branches about as efficient as it can be (but if the differences are huge, the cost of re-writing the working directory is never going to be low). But switching branches has the confusion factor (ie you forget which branch you're on, and apply a patch to your working branch instead of your development branch), so maybe there are other ways of doing the same thing that might be sensible.. So send suggestions to the git lists. Maybe they're insane and can't be done, but while I designed git to work with _my_ case (ie mostly merging tons of different trees and then having occasional big batches of patches), it's certainly _supposed_ to support other maintainers too.. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: git pull on Linux/ACPI release tree
On Mon, 9 Jan 2006, Luben Tuikov wrote: Yes. Ever since I started used git, I never used branch switching, but I do have git branches and I do use git branching. I basically have a branch per directory, whereby the object db is shared as is remotes/refs/etc, HEAD and index are not shared of course. This allows me to do a simple and fast cd to change/go to a different branch, since they are in different directories. So the time I wait to switch branches is the time the filesystem takes to do a cd. This also allows me to build/test/patch/work on branches simultaneously. Yes. It has many advantages, and it's the approach I pushed pretty hard originally, but the many branches in the same tree approach seems to have become the more common one. Using many branches in the same tree is definitely the better approach for _distribution_, but that doesn't necessarily mean that it's the better one for development. For example, you can have a git distribution tree with 20 different branches on kernel.org, but do development in 20 different trees with just one branch active - and when you do a git push to push out your branch in your development tree, it just updates that one branch on the distribution site. So git certainly supports that kind of behaviour, but nobody I know actually does it that way (not even me, but since I tend to just merge other peoples code, I don't actually have multiple branches: I create temporary branches for one-off things, but don't maintain them that way). Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git pull on Linux/ACPI release tree
On Mon, 9 Jan 2006, Martin Langhoff wrote: I think it does. All the tricky stuff that David and Junio have been discussing is actually done very transparently by git-rebase upstream Yes, it's fairly easy to do. That said, I would actually discourage it. I haven't said anything to David, because he is obviously very comfy with the git usage, and it _does_ result in cleaner trees, so especially since the networking code ends up being the source of a lot of changes, the extra cleanup stage that David does might actually be worth it for that case. But git is actually designed to have parallel development, and what David does is to basically artificially linearize it. We merge between us often enough that it doesn't really end up losing any historical information (since David can't linearize the stuff that we already merged), but in _theory_ what David does actually does remove the historical context. So git-rebase is a tool that is designed to allow maintainers to (as the command says) rebase their own development and re-linearize it, so that they don't see the real history. It's basically the reverse of what Len is doing - Len mixes up his history with other peoples history in order to keep them in sync, while David bassically re-does his history to be on top of mine (to keep it _separate_). The git-rebase means that David will always see the development he has done/merged as being on top of whatever my most recent tree is. It's actually a bit scary, because if something goes wrong when David re-bases things, he'll have to clean things up by hand, and git won't help him much, but hey, it works for him because (a) things seldom go wrong and (b) he appears so comfortable with the tool that he _can_ fix things up when they do go wrong. And yes, git-rebase can be very convenient. It has some problems too (which is the other reason I don't try to convince other maintainers to use it): because it re-writes history, a change that _might_ have worked in its original place in history might no longer work after a rebase if it depended on something subtle that used to be true but no longer is in the new place that it has been rebased to. Which just means that a commit that was tested and found to be working might suddenly not work any more, which can be very surprising (But I didn't change anything!). On the other hand, this is no different from doing a merge of two independent streams of development, and getting a new bug that didn't exist in either of the two, just because they changed the assumptions of each other (ie not a _mismerge_, but simply two developers changing something that the other depended on it, and the bug only appears when both the working trees are merged and the end result no longer works). So my suggested git usage is to _not_ play games. Neither do too-frequent merges _nor_ play games with git-rebase. That said, git-rebase (and associated tools like git-cherry-pick etc) can be a very powerful tool, especially if you've screwed something up, and want to clean things up. Re-doing history because you realized that a you did something stupid that you don't want to admit to anybody else. So trying out git-rebase and git-cherry-pick just in case you decide to want to use them might be worthwhile. Making it part of your daily routine like David has done? Somewhat questionable, but hey, it seems to be working for David, and it does make some things much easier, so.. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git pull on Linux/ACPI release tree
On Mon, 9 Jan 2006, Al Viro wrote: How do you deal with conflict resolution? That's a genuine question - I'm not talking about deliberate misuse to hide an attack, just a normal situation when you have to resolve something like A: if (foo) bar B: if (foo baz) bar A': #ifdef X if (foo) bar ... #endif merge of A' and B: trivial conflict Actually, these days git is pretty good at it. Much better than CVS, certainly. You can see the conflict against my old tree, or conflict against the remote tree by using the --ours or --theirs flag to git diff respectively. (Or diff conflict against common base: git diff --base). So for your particular example with a trivial base file: line1 2 3 if (foo) bar 6 7 8 and then the changes you had as an example in the A' and B branches, if I from A' do a git pull . B, I get: Trying really trivial in-index merge... fatal: Merge requires file-level merging Nope. Merging HEAD with ad56343c578785b8d932224a8676615e7a3e191f Merging: 9d619225e3adecee6432a36d67d140e29b0acf62 A' case ad56343c578785b8d932224a8676615e7a3e191f B: case found 1 common ancestor(s): 93765ba3f64e9c73438e52683fffa68e5a493df7 Base commit Auto-merging A CONFLICT (content): Merge conflict in A Automatic merge failed; fix up by hand and then the file contains the contents line1 2 3 HEAD/A #ifdef X if (foo) === if (foo baz) ad56343c578785b8d932224a8676615e7a3e191f/A bar 6 #endif 7 8 ie it will have does a CVS-like merge for me, and I need to fix this up. However, to _help_ me fix it up, I can now see what the diff is aganst my original version (A'), with git diff --ours (the --ours is default, so it's unnecessary, but just to make it explicit): * Unmerged path A diff --git a/A b/A index 06dd3bc..7334364 100644 --- a/A +++ b/A @@ -1,8 +1,12 @@ line 1 2 3 + HEAD/A #ifdef X if (foo) +=== + if (foo baz) + ad56343c578785b8d932224a8676615e7a3e191f/A bar 6 #endif which is very helpful especially once I have resolved it. IOW, I just edit the file and do the trivial resolve, and now I can do a git diff again to make sure that it looks ok: * Unmerged path A diff --git a/A b/A index 06dd3bc..924fc97 100644 --- a/A +++ b/A @@ -2,7 +2,7 @@ line1 2 3 #ifdef X - if (foo) + if (foo baz) bar 6 #endif ahh, looks good, so I just do git commit A and that creates the resolved merge. The obvious way (edit file in question, update-index, commit) will not only leave zero information about said conflict and actions needed to deal with it, but will lead to situation when git whatchanged will not show anything useful. Now, this is a real issue. The resolve part is pretty easy, but the fact that it's hard to see in git-whatchanged is a limitation of git-whatchanged. You need to use gitk, which _does_ know how to show merges as a diff (and yes, I just checked). Is there any SOP I'm missing here? You're just missing the fact that git-whatchanged (or rather, git-diff-tree) isn't smart enough to show merges nicely. It really _should_. It doesn't. You can choose to show merges with the -m flag, but that will show diffs against each parent, which really isn't what you want. I should do the same thing gitk does in git-diff-tree. Worse (for my use), format-patch on such tree will not give a useful patchset. You get a series of patches that won't apply to _any_ tree. Now, git-diff-tree _does_ do that. Use the -m flag, and choose the tree you want. And btw, that works with git-whatchanged too. You _can_ pass the -m flag to git-whatchanged, and it will show you each side of the merge correctly. So it _works_. It's just such a horrible format that by default it prefers to shut up about merges entirely. (I don't know of a good three-way diff format. gitk can do it, because gitk can show colors. That's a big deal when you do three-way - or more-way - diffs). And that's a fundamental problem behind all that rebase activity, AFAICS. You do _not_ want to rebase a merge. It not only won't work, it's against the whole point of rebasing. Rebasing is really only a valid operation when you have a few patches OF