Re: [PATCH] PM: Introduce PM_EVENT_HIBERNATE (was: Re: i915 hibernation patch (was: Re: 2.6.25-rc2 System no longer ...))

2008-02-23 Thread Linus Torvalds


On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
 
 Thanks for testing.  Below is the final version of the patch with a changelog
 etc.

Thanks, applied. 

With this, I also find that I dislike the use of suspend/resume for 
freezing for STD a lot less. It's still too easy to get confused, but at 
least now drivers always have total knowledge about what is really going 
on. I'd not like this interface as a driver writer, but now it's not 
fundamentally broken any more, just slightly confusing.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Suspend-devel] 2.6.25-rc2 System no longer powers off aftersuspend-to-disk. Screen becomes green.

2008-02-22 Thread Linus Torvalds


On Fri, 22 Feb 2008, Ingo Molnar wrote:
 
 btw., why isnt there an in-kernel whitelist, with perhaps a dynamic, 
 convenient /debug/s2r/whitelist append-API for distros (and testers) to 
 add more entries to the whitelist/blacklist? (for cases where the kernel 
 whitelist has not caught up yet) Which would eventually converge to 
 Utopia: s2ram that just works out of box.

The big problem with that is
 - the people who know about the devices are usually not kernel people
 - the workarounds that the whitelist requires is quite often not a kernel 
   workaround.

In other words, the most common workarounds for the s2ram whitelist is 
usually to do things like running vbetool in user-level to do VGA register 
save/restore (VBE_POST and VGE_SAVE). Sure, the kernel could do that with 
usermodehelper etc, but s2ram also has those things as command line flags 
etc, so...

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: i915 hibernation patch (was: Re: [Suspend-devel] 2.6.25-rc2 System no longer powers off aftersuspend-to-disk. Screen becomes green.)

2008-02-22 Thread Linus Torvalds


On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:

 --- linux-2.6.orig/drivers/char/drm/i915_drv.c
 +++ linux-2.6/drivers/char/drm/i915_drv.c
 @@ -222,6 +222,7 @@ static void i915_restore_vga(struct drm_
  dev_priv-saveGR[0x18]);
  
   /* Attribute controller registers */
 + inb(st01);
   for (i = 0; i  20; i++)
   i915_write_ar(st01, i, dev_priv-saveAR[i], 0);
   inb(st01); /* switch back to index mode */

I'm doing this part separately, please drop it - it has nothing to do with 
the rest of the patch.

I'd also suggest that you just add a helper function like

int pm_event_powerdown(struct pm_message mesg)
{
return mesg.event = PM_EVENT_SUSPEND;
}

or something, so that you can have code like

if (pm_event_powerdown(mesg))
pci_set_power_state(pdev, PCI_D3hot);

instead of the test for EVENT_SUSPEND/HIBERNATE explicitly.

Of course, the places that already do a switch-statement are much better 
kept that way and just add PM_EVENT_HIBERNATE to the list.

 --- linux-2.6.orig/drivers/pci/pci.c
 +++ linux-2.6/drivers/pci/pci.c
 @@ -554,6 +554,7 @@ pci_power_t pci_choose_state(struct pci_
   case PM_EVENT_PRETHAW:
   /* REVISIT both freeze and pre-thaw should use D0 */
   case PM_EVENT_SUSPEND:
 + case PM_EVENT_HIBERNATE:
   return PCI_D3hot;

Didn't you miss the apci_pci_choose_state() thing that also needs this 
extension?

 Index: linux-2.6/drivers/usb/host/u132-hcd.c
 ===
 --- linux-2.6.orig/drivers/usb/host/u132-hcd.c
 +++ linux-2.6/drivers/usb/host/u132-hcd.c
 @@ -3214,14 +3214,18 @@ static int u132_suspend(struct platform_
  return -ESHUTDOWN;
  } else {
  int retval = 0;
 -if (state.event == PM_EVENT_FREEZE) {
 +
 + switch (state.event) {
 + case PM_EVENT_FREEZE:
  retval = u132_bus_suspend(hcd);
 -} else if (state.event == PM_EVENT_SUSPEND) {
 + break;
 + case PM_EVENT_SUSPEND:
 + case PM_EVENT_HIBERNATE:
  int ports = MAX_U132_PORTS;
  while (ports--  0) {
  port_power(u132, ports, 0);
  }
 -}
 + break;
  if (retval == 0)
  pdev-dev.power.power_state = state;
  return retval;

Looks like a missing close-brace to me there - you removed the final '}'.

Or am I blind?

Apart from those issues it looks fine to me.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: i915 hibernation patch (was: Re: [Suspend-devel] 2.6.25-rc2 System no longer powers off aftersuspend-to-disk. Screen becomes green.)

2008-02-22 Thread Linus Torvalds


On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
 
 In the revised patch below I redefined the PM_EVENT_* things as flags so
 that I can or them and defined PM_EVENT_SLEEP in analogy with
 CONFIG_PM_SLEEP.

Ok, looks fine by me.

  Didn't you miss the apci_pci_choose_state() thing that also needs this 
  extension?
 
 No, I didn't.  That one operates on the ACPI D* states.

Ok. I consider that just about the worst interface ever, but whatever...

 OK, please have a look at the modified patch below.

All right, I'm fine with it. Now we just need to confirm that it works for 
people..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-20 Thread Linus Torvalds


On Thu, 21 Feb 2008, Jeff Chua wrote:
 
 After inserting return 0; right at the top of those two functions, suspend
 (and power-off properly), and resume (without green screen) works just fine.
 
 I would like to know what they're for.

Try suspend-and-resume without X.

Also, try it on one of the more modern laptops - even *with* X.

Basically, the kernel wants to be able to do what X does, because it means 
that when it works, it works _so_ much better than doing it in X. So 
getting it working is definitely worth it.

That said, before you do anything else, try if suspend-to-RAM works. 

That's the primary goal for this code anyway, and if it works that gives a 
good hint. Suspend-to-disk is fundamentally different, and it's entirely 
possible that for the suspend-to-disk case we should just say screw 
trying to suspend/resume graphics, since you'll have the BIOS resuming 
text-mode anyway, and there are no performance or debugging advantages.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-20 Thread Linus Torvalds


On Thu, 21 Feb 2008, Jeff Chua wrote:
 
 Works without those two functions.

Ahh. You're using the BIOS to re-initialize your video, aren't you? 

If STR works without X, then you have something else resuming graphics, 
and that may be what then interacts badly with the fact that the kernel 
also does so. 

 Ok, what's next?

Let's try to narrow it down to what the interaction is. Are you using 
something like acpi_sleep=s3_bios or similar? That's what the kernel 
support is supposed to make unnecessary in the long run, along with all 
the video mode flickering (ie we should be able to resume to the video 
mode we want, not flicker through unnecessary modes).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-20 Thread Linus Torvalds


On Thu, 21 Feb 2008, Jeff Chua wrote:
 
  That said, before you do anything else, try if suspend-to-RAM works.
 
 Linus, guess I missed this part ... so before touch anything, I did
 tried suspend-to-ram, and it works on console and in X.

Ok, so this is with clean current -git, and nothing disabled?

 And suspend-to-disk hangs, but I can still press and hold the power
 button to power it off.

The press and hold for five seconds is actually a hardware feature of 
the southbridge (well, I guess there is software in there too, but it's 
the embedded kind). So the fact that it powers off at that point means 
nothing, it just means that ok, your kernel is hung, but the hardware
still works ;)

This *sounds* like some part of the suspend-to-disk sequence is doing 
something stupid like trying to access the screen after it has been turned 
off, which doesn't surprise me at all. My oft-stated opinion has been that 
suspend-to-disk isn't a suspend at all, and should never have been 
confused with suspending anything.

It's snapshot-and-restore, and my opinion is that:

 - it should *never* call suspend()/resume() at all (that should be
   reserved purely for suspend-to-RAM and has real power management 
   issues!)

 - it should have a totally separate halt/unhalt/restore thing 
   that has nothing what-so-ever to do with power management, and is 
   purely about stopping the hardware for things like USB and network 
   cards (which otherwise do things like scan their command lists 
   asynchronously) and making sure that the driver state is consistent 
   with that stopped hw state.

 - the people who confuse snapshot/restore with suspend/resume are 
   horrible people that cause problems exactly because driver people then 
   get those things mixed up, and something like the video suspend/resume 
   should probably never have impacted suspend-to-disk in the first place!

HOWEVER, that's a separate fight I've had, and in the meantime:

 Then upon powering on and resume, I get the ugly green console screen. 
 I can still type and move around. Starting X runs fine. Ctrl-Alt-Del or 
 switching back to console will get back to the green screen.

.. so this implies that while the laptop apparently hung at the end of the 
snapshotting, the snapshotting did actually work, and it must have hung at 
the very end, presumably when it tried to actually turn the power off.

So there seems to be two (probably largely independent) problems:

 - the hang at shutdown that requires you to press-and-hold the power 
   button to actually cut power.

   At a guess: putting the VGA device into D3hot makes the ACPI code that 
   actually does the shutoff unhappy. Probably because it wants to access 
   the device, and ends up not ever getting the replies it wants, since 
   the hardware has been turned off.

 - the fact that we restore something wrong for you and the screen is 
   green.

   At a guess: the restore_vga ends up restoring some state that wasn't 
   correctly and fully saved.

IOW, I think your patch that disables the two lines actually ends up 
pretty much matching the two *different* problems. Can you confirm that 
doing those two parts of that patch individually actually does 
individually fix the two issues? (Ie disabling D3hot makes it shut down 
nicely but resume with green text, while disabling just restore_vga() ends 
up with shutdown problems, but once you press-and-hold the power button, 
the thing will then restore nicely)+

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-20 Thread Linus Torvalds


On Wed, 20 Feb 2008, Rafael J. Wysocki wrote:
 
 I think we should export the target sleep state somehow.

Yeah. By *not* using -suspend() for freezing or hibernate.

Please, Rafael - just make the f*cking suspend-to-disk use other routines 
already. 99% of all hardware needs to do exactly *nothing* on 
suspend-to-disk, and the ones that really do need things tend to need to 
not do a whole lot.

For example, the freeze action for USB (which is one of the hardest 
things to suspend) should literally be something like just setting the 
controller STOP bit, and waiting for it to have stopped. The unfreeze 
should be to just clear the stop bit, while the restart should be just a 
controller reset to use the current memory image.

NONE OF THIS HAS ABSOLUTELY ANYTHING TO DO WITH SUSPEND.

It never did. I've told people so for years. Maybe actually seeing the 
problems will make people realize.

So please, we shouldn't call -suspend[_late] or -resume[_early] at 
all. Not with PMSG_FREEZE, not with PMSG_*anything*.

Can we please get this fixed some day? 

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-20 Thread Linus Torvalds


On Wed, 20 Feb 2008, Jesse Barnes wrote:
 
 The current callback system looks like this (according to Rafael and the last 
 time I looked):
   -suspend(PMSG_FREEZE)
   -resume()
   -suspend(PMSG_SUSPEND)
   *enter S3 or power off*
   -resume()

Yes, it's very messy.

It's messy for a few different reasons:

 - the one you hit: a driver actually has a really hard time telling what 
   PMSG_SUSPEND really means.

 - more importantly, we generally don't want to suspend/resume the 
   hardware at all around a power-off, because we're going to resume with 
   the state at the time of the PMSG_FREEZE, which means that the hardware 
   has actually *changed* and been used in between!

that second case is very fundamental for things like USB devices, which in 
theory you can hold alive over a real suspend event (ie a STR event), but 
which absolutely MUST NOT be resumed over a suspend-to-disk event, because 
all the low-level request state is bogus!

So the -resume really isn't a resume at all. It's much closer to a 
-reset.

Of course, the solution to this all right now is that we have to reset 
everything even if it *is* a suspend event, so it basically means that STR 
ends up using the much weaker model that snapshot-to-disk uses.

The fundamental problem being that the two really have nothing 
what-so-ever to do with each other. They aren't even similar. Never were.

 And in the long term we could have:
   -suspend()
   *enter S3*
   -resume()

Yes, apart from all the complexities (suspend_late/resume_early). So in 
reality it's more than that, but the suspend/resume things are clearly 
nesting, and they have the potential to actually keep state around 
(because we *know* this machine is not going to mess with the devices in 
between).

IOW, here we actually can have as an option assume the device is there 
when you return.

 or:
   -hibernate()
   *kexec to another kernel to save image*
   *power off*
   -return_from_hibernate() (or somesuch)

Enough people don't trust kexec that I suspect the right thing simply is

-freeze()  // stop dma, synchronize device state
*snapshot*
-unfreeze();   // resume dma
*save image*
[ optionally -poweroff() ] // do we really care? I'd say no
*power off*
-restore() // reset device to the frozen one

which may have four entry-points that can be illogically mapped to the 
suspend/resume ones like we do now, but they really have nothing to do 
with suspending/resuming.

And notice how while freeze/restore kind of pairs like a 
suspend/resume, it really shouldn't be expected to realistically restore 
the same state at all. The restore part is generally much better seen as 
a reset hardware than a resume thing. Because we literally cannot 
trust *anything* about the state since we froze it - we might have booted 
a different OS in between etc. Very different from suspend/resume.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-20 Thread Linus Torvalds


On Wed, 20 Feb 2008, Jesse Barnes wrote:
 
 Really, in the simple s3 case we still need early/late stuff?

Absolutely. 

Two big reasons:

 - debuggability

   I know we don't do this correctly right now, but I want to be able to 
   at least feel like we can some day actually do printk's etc through 99% 
   of the suspend/resume cycle. It's a *huge* thing for debugging problems 
   that happen in the wild, and one of the biggest issues is that we 
   currently usualyl just get a the machine died message when suspend or 
   resume doesn't work.

   Yes, doing printk's to the Intel management flash stuff can help a lot 
   here, and I want that too, but I'd really like to shut down consoles 
   individually rather than having the big hammer approach that shuts 
   them up entirely over the whole suspend/resume sequence (or not at all, 
   if you use no_console_suspend).

   And I'd *really* like to do things like VGA-console shutdown in the 
   late phase (and resume early).

 - it's actually likely *much* simpler for some devices. 

   Simple devices (and that includes things like PCI bridges etc, but 
   also potentially USB host controllers etc) are things that can often be 
   trivially suspended - all the complexity is really not in the 
   controller itself, but beyond, in the bus that it actually drives.

   And the late-suspend/early-resume means that you don't have to worry 
   about things like interrupts happening while you're suspended. Yes, 
   putting the device into D3 will disable interrupts from that device too 
   (unless there are bugs), *BUT* you may be sharing an interrupt line, 
   and interrupts may be posted and delayed, so an earlier interrupt may 
   well be pending etc.

   suspending late and resuming early just avoids those issues entirely.

Sometimes these things interact. For example, firewire is certainly not 
trivial to suspend as a subsystem thing (ie all the devices behind the 
firewire bridge need to do magic things, like spinning down etc that 
obviously can not happen in the final late phase), but the firewire 
controller itself is likely trivial to suspend/resume and can easily be 
handled in the late/early routines. And guess what? It's also exactly what 
you want to happen in case you end up using the firewire RDMA as a debug 
aid.

IOW, you want that firewire controller (and the PCI bridges) working 
really early, so that if a problem does happen when you resume some more 
complex device (say, one of the graphics chips that need X to really come 
alive), you can use the firewire rdma to read out the kernel log buffer 
from memory.

 Well, it seems like we'll have to fix drivers in either case, and isn't a 
 kexec approach fundamentally more sound and simple, design-wise?  Rafael 
 pointed out some problems with properly setting wakeup states, but I think 
 that could be overcome...

I don't personally mind kexec at all, but on the other hand, I don't care 
about suspend-to-disk in the first place. I do know that some people 
really don't want it, and I suspect that they have valid reasons. Ranging 
from memory use to simply just performance. 

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-20 Thread Linus Torvalds


On Wed, 20 Feb 2008, Rafael J. Wysocki wrote:
  
   which may have four entry-points that can be illogically mapped to the
   suspend/resume ones like we do now, but they really have nothing to do
   with suspending/resuming.
 
 Apart from putting devices into the right low power states, that is.

And by right low power states you mean wrong low-power states, right?

The thing is, they really *are* the wrong states for 99% of all hardware.

If you really have a piece of hardware that you want to have the 
-poweroff() thing do the same as -suspend(), then hey, just use the 
same function (or better yet, use two different functions with a call to a
shared part).

Because IT IS NOT TRUE that -suspend() puts the devices in the right 
power state. The power states are likely to be totally different for S3 
and for poweroff, and they are going to differ in different ways depending 
on the device type.

One example would be the one that started this version of the whole 
discussion (shock horror! We're on subject!) ie when you do a system 
shutdown, you generally do not even *want* to put individual devices into 
low-power states at all, because the actual power off the system thing 
will take care of it for you much better.

So to take just something as simple as VGA as an example: you really do 
not want to suspend that device, because you want to see the poweroff 
messages until the very end. 

So that final device -poweroff function really has absolutely *nothing* 
in common with the device -suspend[_late] functions, simply because 
almost any sane driver would decide to do different things.

Of course, we can continue to do the insane thing and just continue to use 
inappropriate and misleadign function callback names, and then encodign 
what the *real* action should be in the argument and/or in magic 
system-wide state parameters.

So in that sense, it's certainly totally the same thing whether we call it 
-shutdown or -poweroff or -eat_a_banana, since you could always just 
look at the argument and other clues, and decide that *this* time, for 
*this* kind of device, the eat a banana callback actually means that we 
should power it off, but wouldn't it be a lot more logical to just make it 
clear in the first place that they aren't called for the same reason at 
all?

I'd claim that it's much easier for everybody (and _especially_ for device 
driver writers) to have

static int my_shutdown(struct pci_device *dev, int state)
{
.. do something ..
}

static int my_suspend(struct pci_device *dev, int state)
{
.. do something ..
}

...
.shutdown = my_shutdown,
.suspend = my_suspend,
... 

than to have

static int my_suspend(struct pci_device *dev, state)
{
.. common code ..
if (state == XYZZY)
..special code..
else
..other case code..
}

...
.suspend = my_suspend
...

even if the latter might be fewer lines. It doesn't really matter if it's 
fewer, does it, if the alternate version is more obvious about what it 
does?

The other issue is that I've long wanted to make sure that when people fix 
suspend-to-ram, they don't screw up suspend-to-disk by mistake and vice 
versa. When a driver writer makes changes, he shouldn't have the kind of 
illogical oops, unintended consequences issues in general. It should be 
pretty damn obvious when he changes suspend code vs when he changes 
snapshot/restore code.

We've somewhat untangled that on the core kernel layer, but we've left 
the driver confusion alone.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-20 Thread Linus Torvalds


On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:
 
 In fact we have acpi_pci_choose_state() that tells the driver which power
 state to put the device into in -suspend().  If that is used, the device ends
 up in the state expected by to BIOS for S4.

First off, nobody should *ever* use that directly anyway.

Secondly, the one that people should use (pci_choose_state()) doesn't 
actually do what you claim it does. It does all kinds of wrong things, and 
doesn't even take the target state into account at all. So look again.

 No.  Again, if there are devices that wake us up from S4, but not from S5,
 they need to be handled differently in the *enter S4* case (hibernation) and
 in the *enter S5* case (powering off the system).

And again, what does this have to do with (the example I used) the 
graphics hardware? Answer: nothing. The example I gave you we simply DO 
THE WRONG THING FOR.

Same thing for things like USB devices - where pci_choose_state() doesn't 
work to begin with. Why do we call suspend() on such a thing when we 
don't want to suspend it? We shouldn't. We should call freeze/unfreeze 
(which are no-ops) and then finally perhaps poweroff, and that final 
stage might want to spin things down or similar.

But *none* of it has anything to do with suspend, and none of it has 
anything to do with pci_choose_state() (much less acpi_pci_choose_state)

The fact is, we should let the driver decide, and we should make it clear 
to the driver writer what he is deciding about - rather than basically lie 
and say suspend the device and put it into D3 even when that's the last 
thing it should ever do.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-20 Thread Linus Torvalds


On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:
 
  Secondly, the one that people should use (pci_choose_state()) doesn't 
  actually do what you claim it does. It does all kinds of wrong things, and 
  doesn't even take the target state into account at all. So look again.
 
 Well, if platform_pci_choose_state() is defined, pci_choose_state() returns
 its result and on ACPI systems that points to acpi_pci_choose_state(), so in
 fact it does what I said (apart from the error path). 

Did you check closer?

I repeat: acpi_pci_choose_state() (when called from pci_choose_state()) 
doesn't even look at the target 'state'. It just blindly assumes that you 
want the deepest sleep-state you can have.

Which happens to be correct for normal suspend, but means that if you want 
to test other states (through '/sys/devices/.../power'), that sounds 
broken.

I didn't check any closer, but go check it yourself. The short and sweet: 
acpi_pci_choose_state() totally ignores its 'state' argument. Do you 
really think that's correct? But yes, pci_choose_state()' effectively 
does that too, apart from PM_EVENT_ON, which is never used.

(But the whole and only point of pci_choose_state() was to do the 
PM_EVENT_FREEZE thing differently, which it doesn't do, so I think the 
real issue here is that the interface is really rather mis-designed)

I suspect most people who ever really looked and worked on this code had a 
specific device in mind, and I'm sure that all of the code individually 
always ends up making sense from the standpoint of some specific device 
driver. It's just that it never seems to make sense from a bigger issues 
standpoint, and often seems senseless from the standpoint of other devices 
of other types.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-19 Thread Linus Torvalds


On Tue, 19 Feb 2008, Jesse Barnes wrote:

 I found the same poweroff issue on my T61.  It turned out to be related to 
 the 
 C state code disabling interrupts when it shouldn't iirc.  Booting 
 with 'idle=poll' seems to work around the problem.
 
 The green screen problem should be fixed (see the DRM git tree for details).

..and the latter is hopefully now merged in my tree too (at least some of 
the drm updates are).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] acer-wmi - Fail gracefully if ACPI is disabled

2008-02-11 Thread Linus Torvalds


On Mon, 11 Feb 2008, Carlos Corbacho wrote:
 
 WMI drivers, like their ACPI counterparts, should also check if ACPI is
 disabled or not, and bail out if so, otherwise we cause a crash.

Shouldn't wmi_has_guid() just return false if ACPI isn't enabled, and 
the drivers should just then always give up?

The proper way to get there would seem to be to just do this instead..

We should *not* add some random ACPI workarounds to individual drivers, we 
should just make the wmi subsystem so robust that nobody *cares* if acpi 
exists or is enabled on that machine.

Linus

---
 drivers/acpi/wmi.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/wmi.c b/drivers/acpi/wmi.c
index 36b84ab..457ed3d 100644
--- a/drivers/acpi/wmi.c
+++ b/drivers/acpi/wmi.c
@@ -673,11 +673,11 @@ static int __init acpi_wmi_init(void)
 {
acpi_status result;
 
+   INIT_LIST_HEAD(wmi_blocks.list);
+
if (acpi_disabled)
return -ENODEV;
 
-   INIT_LIST_HEAD(wmi_blocks.list);
-
result = acpi_bus_register_driver(acpi_wmi_driver);
 
if (result  0) {
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suspend code ordering (again)

2007-12-27 Thread Linus Torvalds


On Thu, 27 Dec 2007, Robert Hancock wrote:
 
 I doubt they would prefer the later ordering in any way that matters, if the
 Windows version they were designed for uses the earlier ordering.

Well, I wouldn't say it's abotu preferring one over the other. It's very 
possible that the BIOS writers were *intending* to prefer ACPI 2.0, and it 
may even be likely that they thought that they wrote it that way, but the 
real issue is that it has apparently never ever been *tested* that way.

So yes, maybe the vendors actually thought they were a good ACPI-2.0 
implementation, but if Windows doesn't do the ordering that the 2.0 spec 
expects, then that is pretty much just a theoretical thing.

But yeah, it would be really nice to have this verified some way. Somebody 
must already know (whether it's a VM person or a BIOS writer, and whether 
they'd tell us, is obviously another issue).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suspend code ordering (again) (was: Re: x86: Increase PCIBIOS_MIN_IO to 0x1500 to fix nForce 4 suspend-to-RAM)

2007-12-25 Thread Linus Torvalds


On Tue, 25 Dec 2007, Rafael J. Wysocki wrote:

 the ACPI specification between versions 1.0x and 2.0.  Namely, while ACPI
 2.0 and later wants us to put devices into low power states before calling
 _PTS, ACPI 1.0x wants us to do that after calling _PTS.  Since we're following
 the 2.0 and later specifications right now, we're not doing the right thing 
 for
 the (strictly) ACPI 1.0x-compliant systems.
 
 We ought to be able to fix things on the high level, by calling _PTS earlier 
 on
 systems that claim to be ACPI 1.0x-compliant.  That will require us to modify
 the generic susped code quite a bit and will need to be tested for some time.

That's insane. Are you really saying that ACPI wants totally different 
orderings for different versions of the spec? And does Windows really do 
that?

Please don't make lots of modifications to the generic suspend code. The 
only thing that is worth doing is to just have a firmware callback before 
the device_suspend() thing (and then on a ACPI-1.0 system, call _PTS 
*there*), and on an ACPI-2.0 system, call _PTS *after* device_suspend().

Still, the fact is, some (most, I think) drivers *should* put themselves 
into D3 only in late_suspend(), so if ACPI-2.0 really expects _PTS to be 
called after that, we're just screwed. That's when the system is really 
down, interrupts disabled etc, we don't want to call anything but the 
final ACPI turn us off stuff there!

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] swsusp: Use platform mode by default

2007-10-16 Thread Linus Torvalds


On Wed, 17 Oct 2007, Qi Yong wrote:
 
 The key point is fall back to shutdown _only_ if !ops, otherwise
 don't touch hibernation_mode. And that solves my problem.

Please, when resurrecting a five-month-old discussion, give more of the 
old context.

I don't know about anybody else, but I get enough email that five months 
might as well be last century when it comes to email discussions, and I 
have long since forgotten the details of whatever caused the issue to 
begin with ;)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ACPI power off regression in 2.6.23-rc8 (NOT in rc7)

2007-09-25 Thread Linus Torvalds


On Tue, 25 Sep 2007 08:51:09 +0200 Damien Wyart [EMAIL PROTECTED] wrote:
 
 Hello,
 
 After testing rc8, I noticed that I couldn't power off the computer
 directly, it only got halted and I had to press the power button
 manually. Just before displaying System halted, the following message
 is displayed:
 
   ACPI : PCI interrupt for device :02:08.0 disabled
 
 I had to first revert 5a50fe709d527f31169263e36601dd83446d5744 then
 f216cc3748a3a22c2b99390fddcdafa0583791a2 (handling of Sx states) to
 recover previous behaviour.

Hmm. Those things *do* seem to be suspicious.

For example, those commits seem to move code that used to be inside 
CONFIG_PM (which pretty much *everybody* has) to be inside 
CONFIG_ACPI_SLEEP (which is a totally different thing, and depends on 
whether the user asked for suspend support or not!

Damien - does it work if you ask for SUSPEND or HIBERNATION support?

Len - why are you guys moving stuff into CONFIG_PM_SLEEP? I know you seem 
to think that absolutely *everybody* should always support suspend and 
hibernation, but the fact is, not everybody does. And it's a totally 
separate thing for normal ACPI CPU runstate support that people have used 
to manage a *running* CPU (and shutting it down).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/2] 2.6.23-rc3: known regressions with patches

2007-09-01 Thread Linus Torvalds


On Wed, 29 Aug 2007, Michal Piotrowski wrote:
 
 Clocks  time
 
 Subject : double hpet clocksource  hard freeze
 References  : http://lkml.org/lkml/2007/8/23/257
 Last known good : ?
 Submitter   : Paolo Ornati [EMAIL PROTECTED]
 Caused-By   : Tony Luck [EMAIL PROTECTED]
   commit 0aa366f351d044703e25c8425e508170e80d83b1
 Handled-By  : John Stultz [EMAIL PROTECTED]
   Tony Luck [EMAIL PROTECTED]
 Patch   : http://lkml.org/lkml/2007/8/23/285
 Status  : patch available

Should be solved by 3b2b64fd311c92f2137eb7cee7025794cd854057, although 
differently from the the patch suggested above. So the regression should 
be gone a of -rc5.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ACPI on Averatec 2370

2007-08-03 Thread Linus Torvalds


On Thu, 2 Aug 2007, Cal Peake wrote:
 On Thu, 2 Aug 2007, Linus Torvalds wrote:
 
  That said, the AMD Turion(tm) 64 X2 Mobile Technology TL-52 _should_ be 
  a REV-F CPU afaik, and it should have thus fallen through to the 
  ENABLE_C1E_MASK logic. Afaik that's broken.
  
  Cal - can you
   (a) test that forcing a return 1 from that amd_apic_timer_broken() 
   function fixes it for you.
 
 ACK
 
   (b) make that function print out the values it uses for debugging (ie the 
   xtended family and model numbers, and the MSR_K8_ENABLE_C1E MSR 
   values)?
 
 eax  CPUID_XFAM == 0x
 eax  CPUID_XMOD == 0x0004

Yeah, that's a REV-F

 MSR_K8_ENABLE_C1E lo == 0x04c14015
 MSR_K8_ENABLE_C1E hi == 0x
 lo  ENABLE_C1E_MASK == 0

And yeah, that claims that C1E is not on, but:

 amd_apic_timer_broken: forcing return value of 1

since this makes it all work for you, it does appear that the AMD local 
timer stops in C1 even when that isn't true, and as such is not useful.

Sad. It probably means that we have to disable the local timer for *all* 
modern AMD CPU's.

Thomas/Ingo - did something change in the local apic programming? Or why 
did this work before? Was it just that we didn't use the local timer apic 
for some other reason?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ACPI on Averatec 2370

2007-08-02 Thread Linus Torvalds


On Thu, 2 Aug 2007, Cal Peake wrote:
 
 Figured I should have sent that right after I hit the send key...
 
 processor : 0
 vendor_id : AuthenticAMD
 cpu family: 15
 model : 72
 model name: AMD Turion(tm) 64 X2 Mobile Technology TL-52

Sadly, this doesn't show the extended family stuff from cpuid.

So it doesn't show any of the bits we actually care about. Sad.

That said, the AMD Turion(tm) 64 X2 Mobile Technology TL-52 _should_ be 
a REV-F CPU afaik, and it should have thus fallen through to the 
ENABLE_C1E_MASK logic. Afaik that's broken.

Cal - can you
 (a) test that forcing a return 1 from that amd_apic_timer_broken() 
 function fixes it for you.
 (b) make that function print out the values it uses for debugging (ie the 
 xtended family and model numbers, and the MSR_K8_ENABLE_C1E MSR 
 values)?

Andi, can you check with your AMD contacts that those bits are correct.. 
Maybe the Mobile Technology things *always* have the broken Enhanced 
Halt State, regardless of any MSR settings? That would perhaps be what 
makes them Mobile.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Introduce CONFIG_HIBERNATION and CONFIG_SUSPEND (updated)

2007-07-29 Thread Linus Torvalds


Ok, I took this, and modified Len's patch to re-introduce ACPI_SLEEP on 
top of it (I took the easy way out, and just made PM_SLEEP imply 
ACPI_SLEEP, which should make everything come out right. I could have 
dropped ACPI_SLEEP entirely in favour of PM_SLEEP, but that would have 
implied changing more of Len's patch than I was really comfy with).

Len, Rafael, please do check that the end result looks ok. 

I suspect ACPI could now take the PM_SLEEP/SUSPEND/HIBERNATE details into 
account, and that some of the code is not necessary when HIBERNATE is not 
selected, for example, but I'm not at all sure that it's worth it being 
very fine-grained.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for 2.6.23-rc1)

2007-07-28 Thread Linus Torvalds


On Sat, 28 Jul 2007, Len Brown wrote:
 
 That three-liner will crash ACPI+SMP-HOTPLUG_CPU kernels on resume.

Explain that to me.

There should *be* no resume.

ACPI doesn't suspend/resume on its own, I hope.

It is all done by the top-level suspend/resume code, not by ACPI. ACPI is 
a pure helper, and if you've changed that, then I think we need to revert 
more than a few lines.

And it's the *top*level* code that selects HOTPLUG_CPU. Through 
SUSPEND_SMP (which will select HOTPLUG_CPU) and SOFTWARE_SUSPEND.

This is why it's so *totally* and *utterly* bogus for ACPI to select 
features that it has nothign what-so-ever to do with. 

In other words: ACPI isn't in the driving seat. 

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for 2.6.23-rc1)

2007-07-28 Thread Linus Torvalds


On Sat, 28 Jul 2007, Linus Torvalds wrote:
 
 And it's the *top*level* code that selects HOTPLUG_CPU. Through 
 SUSPEND_SMP (which will select HOTPLUG_CPU) and SOFTWARE_SUSPEND.

In other words, the problem seems to be that 

kernel/power/main.c:
suspend_devices_and_enter()

does the proper disable/enable_nonboot_cpus(), but it does so without 
having enabled CPU hotplug.

And you seem to think that it's ACPI that should enable the hotplug, even 
though the code that actually needs it is _outside_ ACPI. And I think 
that's wrong, and that this is a bug.

So I think the real issue is that we allow that 
suspend_devices_and_enter() code to be compiled without HOTPLUG_CPU in 
the first place. It's not supposed to work that way.

Of course, it may well be that other architectures can happily suspend 
even with multiple CPU's active, which may be the cause of this mess. But 
I really think it shouldn't be ACPI that has to select the CPU hotplug, 
since it's not ACPI that _uses_ it in the first place.

Rafael: making a config option for STR (the same way we have a config 
option for hibernate), and just not allowing it on SMP without HOTPLUG_CPU 
seems to be the right thing. Len is right in that we do insane things 
right now (trying to STR with multiple CPU's still active), and I just 
don't think he's the one that should work around it!

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for 2.6.23-rc1)

2007-07-28 Thread Linus Torvalds


On Sat, 28 Jul 2007, Rafael J. Wysocki wrote:
 
 OK, I'll prepare a patch to introduce CONFIG_SUSPEND, but that will require
 quite a bit of (compilation) testing on different architectures.

Sure. I'm not too worried, the fallout should be of the trivial kind. 

Also, mind basing it on the (independent) cleanups that Adrian already 
sent out. This is all intertwined..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for 2.6.23-rc1)

2007-07-26 Thread Linus Torvalds


On Thu, 26 Jul 2007, Rafael J. Wysocki wrote:
 
 Hmm, perhaps we should introduce a CONFIG_SUSPEND and change
 CONFIG_SOFTWARE_SUSPEND into CONFIG_HIBERNATION, both depending on
 CONFIG_PM?
 
 There's quite some code needed only for suspend compiled in when CONFIG_PM is
 set ...

Sounds like a good idea, although I suspect that CONFIG_PM really *is* 
fairly close to CONFIG_SUSPEND. The thing is, all the stuff it enabled is 
largely useless without at least STR.

(Yes, I realize that you can do per-driver suspend events etc, but I 
suspect not a lot of people have used it).

But from a logical standpoint, it does make sense to have a separate 
config option for the STR support.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PATCH] ACPI patches for 2.6.23-rc1

2007-07-26 Thread Linus Torvalds


On Thu, 26 Jul 2007, Len Brown wrote:
 
 Feel free to share what you know about the benefits vs. the costs
 of maintaining CONFIG_ACPI_SLEEP as a build option.

Why don't you just make CONFIG_ACPI_SLEEP dependent on SOFTWARE_SUSPEND 
and STR?

 If you feel that your system has been degraded
 because it now includes what used to be excluded under
 CONFIG_ACPI_SLEEP=n, please let me know how.

I feel that I get asked to include a feature that 
 (a) I have no interest in on that machine
 (b) I didn't need to include before.

What was the advantage? And what was it that caused something like this to 
be a post-rc1 thing. That makes me really unhappy. This is a *regression*.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for 2.6.23-rc1)

2007-07-26 Thread Linus Torvalds


On Thu, 26 Jul 2007, Rafael J. Wysocki wrote:
 
 My point is we have ACPI dependent on PM, so if you want ACPI, you end
 up with all of the STR stuff built in, which is what you don't like (if I
 understand that correctly).  If we have CONFIG_SUSPEND, you'll be able to
 choose ACPI alone. :-)

Good point. 

Anyway, I think the ACPI problem really is as trivial as the following 
three-liner removal fix. If the user doesn't want suspend, ACPI shouldn't 
force it on him.

A nicer fix might be to also make some of the ACPI helper routines depend 
on whether they are needed or not (which in turn will depend on whether 
suspend support has been compiled into the kernel), but quite frankly, 
that's secondary at least for me.

So if we have a few ACPI routines that will never get called (because we 
don't even enable the interfaces that would *cause* them to be called), I 
don't think that's a huge problem. It's a beauty wart, but nobody really 
cares (and it's even something that we could get the compiler to optimize 
away for us if we really cared).

Linus

---
Don't force-enable suspend/hibernate support just for ACPI

It's a totally independent decision for the user whether he wants
suspend and/or hibernation support, and ACPI shouldn't care.

Signed-off-by: Linus Torvalds [EMAIL PROTECTED]
---
 drivers/acpi/Kconfig |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 251344c..22b401b 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -11,9 +11,6 @@ menuconfig ACPI
depends on PCI
depends on PM
select PNP
-   # for sleep
-   select HOTPLUG_CPU if X86  SMP
-   select SUSPEND_SMP if X86  SMP
default y
---help---
  Advanced Configuration and Power Interface (ACPI) support for 
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PATCH] ACPI patches for 2.6.23-rc1

2007-07-25 Thread Linus Torvalds


On Wed, 25 Jul 2007, Len Brown wrote:
 
 git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git release
 
 Fixes regressions -- a build failure, an oops, some dmesg spam.
 Also fixes some D-state issues and adds ACPI module auto-loading.
 Yes, I'd hoped to get the last two in before rc1.
 I'm hopeful that a couple-days into rc2 is sufficiently early for them.

I hate pulling this, but I did. However, what I hate even more after 
having done so is that ACPI now seems to select CPU hotplug. Why?

That is just *broken*. Sure, if you select STR or hibernation, we need CPU 
hotplug, but just for picking ACPI? Why?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PATCH] ACPI patches for 2.6.23

2007-07-22 Thread Linus Torvalds


On Sun, 22 Jul 2007, Len Brown wrote:
 
 please pull from: 
 
 git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git release

Pulled.

Btw, on a mostly unrelated note (and to balance the fact that mostly I 
only speak up to complain), I'd like to say how much I appreciate that not 
only are your git tree histories now clean and readable and don't have 
unnecessary merges, but the use of clear topic branches really makes it 
very pleasant to read and see what changed.

I'm certainly hoping the topic branches make your life easier too (ie able 
to track things by topic rather than having changes mixed up), but even if 
it doesn't, I wanted to say that I appreciate how it makes it easier for 
outsiders to follow the big outline of your development tree.

So I just wanted to say that I appreciate how it's nice to do gitk 
ORIG_HEAD.. and see the outline of what has been going on in the ACPI 
tree after I have pulled. And I'm hoping it actually helps you guys too, 
but it's appreciated regardless.

See? Sometimes I can say nice things too.

Linus not _always_ grouchy Torvalds
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] swsusp: Use platform mode by default

2007-05-11 Thread Linus Torvalds


On Fri, 11 May 2007, Rafael J. Wysocki wrote:
 
 We're working on fixing the breakage, but currently it's difficult, because
 none of my testboxes has problems with the 'platform' hibernation and I
 cannot reproduce the reported issues.

The rule for anything ACPI-related has been: no regressions.

It doesn't matter if something fixes 10 boxes, if it breaks a single one, 
it's going to get reverted.

We had much too much of the two steps forward, one step back dance with 
ACPI a few years ago, which is the reason that rule got installed (and 
which is why it's ACPI-only: in some other subsystems we accept the fact 
that sometimes we don't know how to fix some hardware issue, but the new 
situation is at least better than the old one).

I agree that it can be aggravating to know that you can fix a problem for 
some people, but then being limited by the fact that it breaks for others. 
But beign able to *rely* on something that used to work is just too 
important, and with ACPI, you can never make a good judgement of which way 
works better (since it really just depends on some random firmware issues 
that we have zero visibility into).

Also, quite often, it may *seem* like something fixes more boxes than it 
breaks, but it's because people report *breakage* only, and then a few 
months later it turns out that it's exactly the other way around: now it's 
a hundred people who report breakage with the *new* code, and the reason 
people thought it fixed more than it broke was that the people for whom 
the old code worked fine obviously never reported it!

So this is why a single regression is considered more important than ten 
fixes - because a single regressionr report tends to actually be just the 
first indication of a lot of people who simply haven't tested the new code 
yet! People for whom the old code is broken are more likely to test new 
things.

So I'd just suggest changing the default back to PM_DISK_SHUTDOWN (but 
leave the pm_ops-enter testing in place - ie not reverting the other 
commits in the series).

The patch would look something like this. Coywolf, does this fix it for 
you?

Linus

---
 kernel/power/disk.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/power/disk.c b/kernel/power/disk.c
index b5f0543..f6aa06e 100644
--- a/kernel/power/disk.c
+++ b/kernel/power/disk.c
@@ -60,9 +60,9 @@ void hibernation_set_ops(struct hibernation_ops *ops)
}
mutex_lock(pm_mutex);
hibernation_ops = ops;
-   if (ops)
-   hibernation_mode = HIBERNATION_PLATFORM;
-   else if (hibernation_mode == HIBERNATION_PLATFORM)
+
+   /* Turn off HIBERNATION_PLATFORM if we no longer have any platform ops 
*/
+   if (!ops  hibernation_mode == HIBERNATION_PLATFORM)
hibernation_mode = HIBERNATION_SHUTDOWN;
 
mutex_unlock(pm_mutex);
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] swsusp: Use platform mode by default

2007-05-11 Thread Linus Torvalds


On Fri, 11 May 2007, Rafael J. Wysocki wrote:
 
 Just to clarify, the change in question isn't new.  It was introduced by the
 commit 9185cfa92507d07ac787bc73d06c4eec7239 before 2.6.20, at Seife's
 request and with Pavel's acceptance.

Ok, if it's that old, we migt as leave it in. Clearly there weren't many 
regressions, and this isn't a case of other monsters lurking behind a lack 
of testers.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PATCH] ACPI patches for 2.6.22 - part 2

2007-05-10 Thread Linus Torvalds


On Thu, 10 May 2007, Linus Torvalds wrote:
 
 Seems to work for me. My evo correctly started the fan, and stopped it 
 when the temperature went down again.

Looking at things in top, I do end up occasionally seeing spikes where 
kacpid takes 17% of CPU time, and kacpi_notify takes a few percent too. 
But the machine works ok, and it doesn't seem to be horrible:

   64 ?S 0:15 [kacpid]
   65 ?S 0:08 [kacpi_notify]

so they've gotten 23 seconds of CPU time over the 37 minutes that laptop 
has been up now. That's arguably too much, but on the other hand, I did 
end up trying to stress it out by doing some 3D stuff while compiling the 
kernel and doing git grep over the kernel tree etc.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Add suspend/resume for HPET

2007-03-31 Thread Linus Torvalds


On Sat, 31 Mar 2007, Thomas Gleixner wrote:
 
 While I agree in principle with the patch, I'm a bit uncomfortable. The
 sys device suspend / resume ordering is not guaranteed and relies on the
 registering order.

Well, this is why we probably should try to get away from the system 
device issue, exactly because system devices are totally outside the 
normal ordering and only have a random linear order.

If the clocksources were actually in the device tree, you'd get all the 
normal guarantees about hierarchical ordering..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Add suspend/resume for HPET

2007-03-31 Thread Linus Torvalds


On Sat, 31 Mar 2007, Thomas Gleixner wrote:
 
 Right, but clock - sources/events need to be extremly late suspended and
 early resumed. How can we ensure this ?

Make them be at the top of the device tree by adding them early. That's 
the whole point of the device tree after all - we have an ordering that is 
enforced by its topology, and that we sort by when things were added.

Right now the way things work is (iirc - somebody like Greg should 
double-check me) is that we add all devices to the power list at 
device_add() time by traversing the devices fromt he root all the way out, 
and doing a device_add() which does a device_pm_add(), which in turn adds 
it to the power-management list - so that the list is always topologically 
sorted.

So the only thing that needs to be done is to make sure that we add the 
timer devices early during bootup - something we have to do *anyway*. If a 
device is added early in bootup, that automatically means that it will be 
suspended late, and resumed early - because we maintain that order all the 
way through..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Add suspend/resume for HPET

2007-03-31 Thread Linus Torvalds


On Sat, 31 Mar 2007, Maxim Levitsky wrote:
 
 So maybe I was right afrer all,
 Maybe it is better to add a suspend/resume hook to each clock source and call 
 it from timekeeping_resume() ?

Umm.. WHy not make the device tree look like this:

-- clocksource -- +-- HPET
|
+-- TSC
|
+-- i8259
|
+-- lapic timer
|
.. whatever else

and use the struct device that we *have* for this? The whole struct 
device is literally designed to do this, and to be embedded into whatever 
bigger structures you have that describes higher-level behaviour. Ie you'd 
put a struct device inside the struct clocksource.

That thingalready *has* the suspend/resume hooks, and it will mean that 
people will see the clocks in the device tree rather than have a new 
notion.


Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ PATCH] Add suspend/resume for HPET was: Re: [3/6] 2.6.21-rc4: known regressions

2007-03-29 Thread Linus Torvalds


On Thu, 29 Mar 2007, Maxim wrote:

 On Thursday 29 March 2007 07:08:58 Linus Torvalds wrote:
  
  (Or, better yet, shouldn't we set boot_hpet_disable when we decide not 
  to use the HPET, and set hpet_virt_address to NULL?)
 
 This is done here
 
 out_nohpet:
   iounmap(hpet_virt_address);
   hpet_virt_address = NULL;

No, that only clears hpet_virt_address, and thus makes all subsequent 
hpet_readl() and hpet_writel() calls oops.

But it doesn't actually *tell* anybody that the HPET is disabled, so if 
you later on do

if (is_hpet_capable()) {
time = hpet_readl(..);
..

you will just Oops!

So as far as I can see, even with your latest patch, if hpet_enable() 
fails (and triggers the goto out_nohpet cases), you'll just oops 
immediately when you try to suspend/resume the HPET.

THAT was what I meant - when we clear hpet_virt_address, we should also 
tell all potential subsequent users that the HPET is not there!

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Add suspend/resume for HPET

2007-03-29 Thread Linus Torvalds


On Thu, 29 Mar 2007, Maxim Levitsky wrote:

 Subject: Add suspend/resume for HPET

 This adds support of suspend/resume on i386 for HPET

 Signed-off-by: Maxim Levitsky [EMAIL PROTECTED]
 
 ---
  arch/i386/kernel/hpet.c |   68 
 +++

Btw, what about arch/x86_64/kernel/hpet.c?

That thing seems totally broken. Lookie here:

  arch/x86_64/kernel/hpet.c:irqreturn_t hpet_rtc_interrupt(int irq, void 
*dev_id, struct pt_regs *regs)
  drivers/char/rtc.c:extern irqreturn_t hpet_rtc_interrupt(int irq, void 
*dev_id);

anybody see a problem? The x86-64 version doesn't seem to be very well 
maintained. Is there some fundamental reason why this file isn't shared 
across architectures?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ PATCH] Add suspend/resume for HPET was: Re: [3/6] 2.6.21-rc4: known regressions

2007-03-29 Thread Linus Torvalds


On Thu, 29 Mar 2007, Maxim Levitsky wrote:

   I agree, and as you said I did exactly that:

Gaah. I'm blind. Sorry. Your patch did indeed do exactly that, I somehow 
overlooked it.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/6] 2.6.21-rc4: known regressions

2007-03-28 Thread Linus Torvalds


On Wed, 28 Mar 2007, Maxim wrote:
 
   Now I don't have a clue how to set those bits if only HPET is used as 
 clock source because now clocksources
   don't have _any_ resume hook.

One thing that drives me wild about that clocksource resume thing is 
that it seems to think that clocksources are somehow different from any 
other system devices..

Why isn't the HPET considered a device, and has it's own *device* 
suspend and resume? Why do we seem to think that only set_mode() 
etc should wake up clock sources?

It's a *device*, dammit. It should save and resume like one (probably as a 
system device). The set_mode() etc stuff is at a completely different 
(higher) conceptual level.

Thomas? It does seem like Maxim has hit the nail on the head (at least 
partly) on the HPET timer resume problems..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ PATCH] Add suspend/resume for HPET was: Re: [3/6] 2.6.21-rc4: known regressions

2007-03-28 Thread Linus Torvalds


On Thu, 29 Mar 2007, Maxim wrote:

   I am sending here a patch that as was discussed here adds hpet to list 
 of system devices
   and adds suspend/resume hooks this way.
   I tested it and it works fine.

Ok, it certainly looks better, but it *also* looks like it just assumes 
the HPET is there. Which would work in testing _with_ a HPET, but would 
likely break on hardware without one, no?

Shouldn't there be at least something like a

if (!is_hpet_capable())
return 0;

at the top of that init routine? I'd also expect that you'd need to check 
that hpet_virt_address is valid or something?

(Or, better yet, shouldn't we set boot_hpet_disable when we decide not 
to use the HPET, and set hpet_virt_address to NULL?)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2/6] 2.6.21-rc2: known regressions

2007-03-09 Thread Linus Torvalds


On Fri, 9 Mar 2007, Ingo Molnar wrote:
  Some day we may have modesetting support in the kernel for some 
  graphics hw, right now it's pretty damn spotty.
 
 having no video is what i'd have expected - but getting a /hang/ is not 
 what i'd have expected.

I debugged a case exactly like this (with the TRACE_RESUME() thing) back 
last November on my laptop.

The problem is that radeonfb actually *tries* to re-initialize the 
graphics chips, but because it doesn't know all the details, it does it 
wrong, and hangs the whole chip.

I forget exactly where it happened, and it might well be different for 
you. But this is exactly what TRACE_RESUME can be used to pinpoint if you 
want to try to debug it. 

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2/6] 2.6.21-rc2: known regressions

2007-03-08 Thread Linus Torvalds

[ Eric, Ingo, can you double-check the timer initialization after resume? 
  We appear to have several reports of date not advancing, and while this 
  could be some SATA issue, it could easily be a timer tick issue too ]

On Thu, 8 Mar 2007, Michael S. Tsirkin wrote:
 
 Here's the status with -rc3: better, but still does not work as well as 
 2.6.20.

Ok. I think we mostly solved the irq-related stuff, but you might want to 
check whether you have CONFIG_NOHZ on or off and whether that makes a 
difference.

 2. First disk access after resume takes a couple of minutes
(seemed instant with 2.6.20) during this time no new messages show on 
 console

Yeah, there is some problem with SATA resume. It would be beautiful if the 
people who actually see this could narrow it down with bisection. It 
works for me is clearly the case for many people, but not all.

But before blaming SATA, check if you have NO_HZ enabled and whether 
disabling that makes it work ok. If timeouts don't work right (or are 
*extremely* slow) things that should be instant won't be.

 3. When I switch to X (CTRL-ALT-F7), X hangs after drawing a couple of windows
after waiting for some 10 min, I rebooted.
no new messages showed up in /var/log/messages

I think this is likely just more of the disk being buggered, but it could 
again be related to NO_HZ (people report time not advancing, and that 
would make any X timeout taking forever, and you'd see exactly your 
behaviour).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2/6] 2.6.21-rc2: known regressions

2007-03-08 Thread Linus Torvalds


On Fri, 9 Mar 2007, Ingo Molnar wrote:
 
 disabling the following radeonfb options in the .config made resume work 
 again:

In general, don't even *try* to use radeonfb for suspend/resume.

I don't think it has ever worked, except on some very rare laptops 
(largely PPC Macs) where people had enough information to set up the 
PLL's.

I don't think the other framebuffer drivers are much better.

You're better off using the VGA console, and lettign X re-initialize the 
graphics device. That generally at least has a reasonably good chance of 
working.

Re-initializing graphics modes really is very hard. You can try with the 
BIOS video hack (I forget the kernel command line to turn it on), but we 
really do end up depending on X doing it better.

Some day we may have modesetting support in the kernel for some graphics 
hw, right now it's pretty damn spotty.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc2 regression vs. 2.6.20: AT keyboard only works with pci=noacpi

2007-03-07 Thread Linus Torvalds


On Wed, 7 Mar 2007, Ash Milsted wrote:
 
 So, I tracked this down to 2.6.21-git7, the first snapshot that gives me
 this problem.

Hmm. There is no 2.6.21-git7 (that would be the seventh nightly snapshot 
after 2.6.21 is released, which hasn't happened yet!).

Do you mean that it happens between 2.6.20-git6 and 2.6.20-git7? That 
would be git commits (the way to get them is to look at the *.id file 
that is associated with a snapshot):

66efc5a7e3061c3597ac43a8bb1026488d57e66b -git6
509cb37e173d4e39cec47238397e91b718730794 -git7

and yes, doing a

gitk 66efc5a7..509cb37e

does show an input merge. 

 Tellingly it does contain an input tree merge. I would git bisect
 but I don't have a local copy of the tree - I tried to get one, but it stopped
 halfway through the clone, probably because I had to use http... So, I hope 
 that
 helps.

Can you try rsync? It's not a great protocol in general, but it's 
perfectly fine for an initial clone..

After that, since you have already narrowed it down to a particular 
nightly snapshot, you could do the bisection startign from the known 
commits already:

git bisect start
git bisect good 66efc5a7# 2.6.20-git6 was good
git bisect bad 509cb37e # 2.6.20-git7 was bad

and you'll have less than 500 commits to test (which is quite fast to 
bisect).

If you want to do some manual checking first (ie guessing that the 
bad behaviour came from that particular input merge), you could first try 
out commit 2a598df5, which is the head commit before of the merged input 
tree (this is all trivial to see with the above gitk - the SHA1's may 
sound scary and esoteric, but they're really easy to look up).

That manual check (*if* it turns out that 2a598df5 is indeed the bad one) 
would cut down the range from good to bad to just 18 commits (the range 
would be 66efc5a7..2a598df5), and then you should be able to pinpoint the 
exact bad one from just a few reboots..

(If you have to bisect all 500 commits, it would be ~10 reboots rather 
than four or five).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc2 regression vs. 2.6.20: AT keyboard only works with pci=noacpi

2007-03-07 Thread Linus Torvalds


On Wed, 7 Mar 2007, Ash Milsted wrote:
 
 Anyway, here's the bootlog for Dmitry from a boot with broken keyboard 
 (2.6.21-rc3):

The non-working setup doesn't get any interrupts back, and thus doesn't 
see the ACK for the \xd4\xed command.

It really looks interrupt-related (especially considering that it goes 
away when you ask ACPI to not do certain things), but at the same time, 
the differences between -git6 and -git7 really don't seem to have *any* 
ACPI or PCI irq routing changes, so I think this really is related to the 
input-layer, and perhaps the real difference between ACPI irq routing and 
not is just the timing or IO acecss patterns that you get when you use the 
local apic vs the i8259 legacy irq controller.

For example, if there is a edge-triggered interrupt involved (and both 
keyboard *and* mouse are edge-triggered), the io-apic and the i8259 work 
differently: temporarily disabling the interrupt will reset the edge 
trigger logic on the i8259, but not on an IO-APIC.

So the lack of interrupts could be due to the input layer not clearing the 
interrupt source during setup, so some *old* interrupt just stays around, 
and because it's always set, on an IO-APIC it will never show as an edge 
at all - but on the i8259 the very action of registering the irq routine 
will create an edge.

There's some reason to believe that you may have a pending interrupt 
waiting: for the working case, you actually have:

 Mar  7 23:20:41 joker atkbd.c: Spurious ACK on isa0060/serio0. Some program 
 might be trying access hardware directly.

so there was certainly *something* unexpected there.

I'm not saying that's it, but it could explain why something that looks 
interrupt-related and that changes depending on whether you use ACPI to 
set up interrupts or not can have these kinds of reasons, that just depend 
on which interrupt controller the kernel happens to use, even though it's 
not really about lost interrupts at all, but just a driver that doesn't 
acknowledge a pending one.

Or something.

Doing the git bisect would really help.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PATCH] ACPI patches for 2.6.20-rc1

2006-12-21 Thread Linus Torvalds


On Wed, 20 Dec 2006, Len Brown wrote:
 
 please pull from: 

Is this really all obvious bug-fixes? There seems to be a lot of 
development there that simply isn't appropriate after an -rc1 any more.

I want 2.6.20 to be stable, and one of the things I'm doing is to be 
strict about the merge window.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC] ACPI vs device ordering on resume

2006-12-01 Thread Linus Torvalds


On Fri, 1 Dec 2006, Pavel Machek wrote:

   So it looks like we need this sequence:
   
   enable_nonboot_cpus() /* INIT */
   finish()  /* _WAK */
   device_resume()
  
  Can somebody remind me about this immediately after 2.6.19?
 
 Remind. But note that freezer is not yet SMP safe... Rafael is working
 on that.

Thanks.

On the other hand, I really wonder (and suspect) whether the problem isn't 
really the freezer or even the kernel resume ordering, but simply an ACPI 
internal resume ordering thing.

Doesn't ACPI have per-device WAK calls anyway? Shouldn't we just call 
those _individually_ as we walk the device tree (perhaps in the 
early_resume stage) rather than calling them all in one chunk?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))

2006-11-18 Thread Linus Torvalds

On Sat, 18 Nov 2006, Starikovskiy, Alexey Y wrote:

 May because it does not have a single common line with the previous
 patch?

Yeah, I do agree that it _looks_ very different as a patch, but it ends up 
having all the same execution profiles..

It's been too long since I debugged the previous problem, so I don't 
remember the exact details any more (back then I enabled ACPI debugging 
and watched the messages scroll by etc - this time I initially thought it 
was interrupt-related due to the other irq problems we've had, so I 
started bisecting immediately _without_ doing any ACPI debugging stuff, 
and by the time I actually bisected down enough, I recognized the problem, 
so I didn't do all the same enable ACPI messages and look deeply into 
what is going on thing).

But if I remember correctly, what happens is _roughly_ something like 
this:

 - thermal event happens - the CPU is getting warm, and the fan needs to 
   start up. Quite often, this happened early during boot (which is quite 
   busy - some init scripts are disgustingly CPU-intensive mainly due to 
   using inefficient scripting languages), but if it didn't happen there, 
   it's easy enough to force to happen other ways.

 - part of the handling is acpi_os_execute() for something (don't ask me 
   what), but the interestign thing is how that acpi_os_execue() then 
   ends up causing a _recursive_ event.

 - we handle the original event in kacpid, and hand over the new one as a 
   notification event. But the event keeps on happening, and kacpid keeps 
   on running, and the other thread doesn't actually ever _run_ because 
   kacpid holds he ACPI lock and is constantly busy.

 - we not only are constantly running in kernel space, we also end up 
   eventually running out of memory for allocating all the work queue 
   entries.

So the reason the old code works is because everything is done in a single 
thread, and yes, we end up getting multiple events, but because the queue 
is all done onto the same queue that is _handling_ the events in the first 
place, and because it's a FIFO queue, the notification events get handled 
_before_ the later events.

So with the single-threaded situation, you basically end up always doing 
the events in the same order they came in. In the two separate threads 
case, you don't, and one thread will end up generating events forever, 
waiting for them to happen, but they never _do_ happen, so you have a 
lockup _and_ eventually an infinite event queue for the other thread.

 Or may be because it fixes all the current AMD-HP notebooks? 
 Or may be because it did not fail while being in -mm?

I'm afraid that -mm doesn't get as much testing as it used to get. 

Also, I do realize that the patch fixes other problems, but we have long 
had a very strict policy that we do NOT accept regressions. Immediately 
when you start accepting regressions, you will never know whether you're 
going forward of backwards. It's better to have a known _old_ bug than to 
introduce a new one.

So the no regressions! rule ends up trumping pretty much every single 
other issue. It's unacceptable to have machines that used to work, 
suddenly stop working. Even if it fixes another machine. 

ACPI didn't use to have that rule, and it was wild and crazy. Maybe more 
bugs got fixed, but the problem with accepting regressions is that nobody 
can _ever_ trust that system. You do not want to have people _afraid_ of 
upgrading - they should feel confident that upgrading never introduces any 
new problems.

(Of course, that can never be reached 100%, but it's very much part of the 
goal. It kind of falls into the same backwards compatibility on 
interfaces absolute goal: it's ok to do new things, but you can never 
allow them to break old programs)

 I will not sneak it in again, I promise. 

Feel free to send me test patches when working on these things, because I 
have no trouble at all to test my particular machine.

I think you'll find the ACPI dumps etc for that machine in your archives, 
because I've sent them to Len and the acpi lists several times, but if you 
want to get AML disassemblies etc, just tell me how. I've done them 
before, but I work on this seldom enough that I always forget what the 
magic incantations are, and where to get the tools etc.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))

2006-11-18 Thread Linus Torvalds


On Sat, 18 Nov 2006, Starikovskiy, Alexey Y wrote:
 
 I've sent you a test patch back in July, but did not get a reply. May be
 due to OLS?

Heh. Whenever you send me something like that, and I don't answer within a 
few days, you can pretty much depend on me not answering - my mailqueue 
just fills up too fast. And yeah, it might have been during OLS. Just 
re-send when it happens.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))

2006-11-18 Thread Linus Torvalds


On Sat, 18 Nov 2006, David Brownell wrote:
 
 Running right now with a patch reverting the update which
 made trouble on Linus' machine, but without Alexey's two
 tweaks to the EC interrupt handler.  So far so good, even
 after doing things which had previously caused AE_TIME
 errors pretty quickly.  But then, the errors weren't what
 I'd call reproducible either.

Ok, goodie. 

Adrian, that means that there's one less regression on your list, unless 
David reports that he can reproduce it again (I don't think he will be 
able to: all the other ACPI changes looked relatively harmless, at least 
in the particular area of ACPI changes I looked at)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))

2006-11-17 Thread Linus Torvalds


On Fri, 17 Nov 2006, Adrian Bunk wrote:
 
 Subject: nasty ACPI regression, AE_TIME errors
 References : http://lkml.org/lkml/2006/11/15/12
 Submitter  : David Brownell [EMAIL PROTECTED]
 Handled-By : Len Brown [EMAIL PROTECTED]
  Alexey Starikovskiy [EMAIL PROTECTED]
 Status : problem is being debugged

I do not know if this is related, but testing one of my laptops (always a 
good idea to check the week before release) shows that my trusty old 
Compaq N620c locks up rather quickly at boot with the current -git tree. 

Total lockup - no sysrq, no messages, no nothing.

I've mostly bisected it (what the _hell_ did we do before git bisect?), 
and right now I know:

commit 9aaed2b42d00d4abb2748d72d599a8033600e2bf is bad (that's Len's pull 
trivial into test branch) commit.

v2.6.19-rc2 seems all good.

Which leaves a chunk of just a few ACPI commits left to bisect. 

I'll do five or so more reboots, and I should be able to tell exactly 
which commit breaks. It almost always locks up very early during boot 
(generally during the initializing udev phase), although sometimes it 
survives a bit further..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))

2006-11-17 Thread Linus Torvalds


On Fri, 17 Nov 2006, Linus Torvalds wrote:
 
 Total lockup - no sysrq, no messages, no nothing.

Dammit.

It looks like 37605a6900f6b4d886d995751fcfeef88c4e462c, and I should have 
realized that immediately.

That commit re-introduces the bug that we already reverted once.

Why the hell did that idiotic thing go in, when we had to revert it once 
already (see commit 72945b2b90a5554975b8f72673ab7139d232a121 for the 
earlier revert).

It was broken then, it is broken now. Nothing has changed.

Why did you guys try to sneak it in again? Last time this same use a 
second workqueue patch went in (in a different form), we had _exactly_ 
the same problems, with total lockups, and way too high CPU usage.

The bugzilla entry that you refer to in that commit is even the same one 
that discussed why the _original_ patch was totally broken.

It's even the same AUTHOR who wrote the original buggy patch, that pushed 
through the same buggy patch AGAIN.

Dammit, this is frustrating.

Why did people expect it to suddenly not be buggy?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC] ACPI vs device ordering on resume

2006-11-15 Thread Linus Torvalds


On Wed, 15 Nov 2006, Len Brown wrote:
 
 So it looks like we need this sequence:
 
 enable_nonboot_cpus() /* INIT */
 finish()  /* _WAK */
 device_resume()

Can somebody remind me about this immediately after 2.6.19?

No way am I going to make that kind of a major ordering change right now, 
especially as the thing isn't apparently even a real regression (just more 
fallout from enabling MSI).

It would be good if people who are affected (and people in general that 
are interested in power management) would test out the patch.

In particular, this is an even bigger change than the one suggested by 
Stephen. It means, for example, that we will have device resume (and 
process thawing) being called when we're in SMP mode - and that may or may 
not have issues. So this is a really scary patch to me, no way in hell 
will I apply it right now.

But if people try it out, it would be good..

Btw, this is a clear example of where it might be good to start actively 
using the early_resume thing, and do PCI bus resume there. We might want 
to do early_resume _before_ calling the bios (I'm not at all convinced 
that the firmware will do the right thing without the PCI buses set up, 
for example), and _before_ thawing processes. Then, we might let 
individual devices do their device resume in the normal late resume 
phase.

Greg already carries the patch around for that PCI bus early resume thing, 
I think. 

We need to test these things.

Linus


diff --git a/kernel/power/main.c b/kernel/power/main.c
index 873228c..2989609 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -132,12 +132,12 @@ int suspend_enter(suspend_state_t state)
 
 static void suspend_finish(suspend_state_t state)
 {
-   device_resume();
-   resume_console();
-   thaw_processes();
enable_nonboot_cpus();
if (pm_ops  pm_ops-finish)
pm_ops-finish(state);
+   device_resume();
+   resume_console();
+   thaw_processes();
pm_restore_console();
 }
 
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ACPI vs device ordering on resume

2006-11-14 Thread Linus Torvalds


On Tue, 14 Nov 2006, Stephen Hemminger wrote:
 
 During the pci_restore process, the MSI information and the PCI command 
 register 
 are restored properly. But later during resume, inside the ACPI evaluation of
 the WAK method, the PCI_COMMAND  INTX_DISABLE (0x400) flag is being cleared.
 My guess is that the BIOS ends up doing some resetting of devices.

Hmm.. I think calling pm_ops-finish() early is the right thing to do, 
so your patch looks fine. I'd hate to apply it at this stage in the 2.6.19 
development, though. Comments?

Looking at the code (at least for ACPI), pm_ops-finish() function does 
things that we'd generally want done early. I'm a bit nervous about the 
nesting, though - we call the -prepare function _before_ we do the 
device suspend stuff, so if -finish undoes something that was done by 
-prepare, then we will restore the devices with the state they were in 
_after_ the -prepare.

So from a logical _nesting_ perspective, the -finish() routine should 
happen after devices have been restored. And anything that ACPI does to 
undo -enter should have been done early by ACPI itself as it was 
exiting that -enter routine - that would make the things nest properly.

Gaah. Since there is no way to know what the HELL those ACPI routines will 
actually end up doing, there is no way to know what is the right answer. 
Does anybody know what Windows does? 

I think Stephen's patch is likely good, but I really don't see myself 
applying it to my tree right now. It should either go into -mm, or early 
into the post-2.6.19 tree (or, most likely, both).

Anybody?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.19-rc - ThinkPads

2006-11-01 Thread Linus Torvalds


On Wed, 1 Nov 2006, Andi Kleen wrote:

 Ok please revert the i386 patch for now then if it fixes the ThinkPads. 
 The x86-64 version should be probably fixed too, but doesn't cleanly. I will 
 send you later a patch to fix this there properly.

Actually, I should have just fixed the ordering. I did some cleanups too, 
but those are unrelated (except in the sense that I wanted to look at the 
assembly code, and the cleanups made the code generation at least half-way 
sane!)

I've pushed out the changes, but here is the part that may or may not 
matter for anybody who wants to test it if they don't use git or if it 
hasn't mirrored out yet. Michael? Martin?

Andi: I think the patches should work pretty much as-is for x86-64 too, 
since all the issues would seem to be similar. 

I'm not entirely happy with ioapic_write_entry() now either (if we 
change an entry that was already unmasked, we should probably mask it 
first by writing the low word with the mask bit set, then write the high 
word, and then write the low word again), but 

 - this makes us match the ordering we _used_ to have, so if the cleanup 
   broke things for people, this should unbreak it, and at least not be 
   any worse than it used to be.

 - when we write new unmasked entries, they all _should_ have been masked 
   before, so hopefully the change a unmasked entry while it's unmasked 
   case doesn't actually ever happen. But I didn't actually _check_.

Somebody should look into that case. Does anybody feel like they want to 
learn more about the IO-APIC? Halloween is over and gone, but if you want 
to scare small children _next_ year, telling them about the IO-APIC is 
likely a good strategy.

Linus

---
commit f9dadfa71bc594df09044da61d1c72701121d802
Author: Linus Torvalds [EMAIL PROTECTED]
Date:   Wed Nov 1 10:05:35 2006 -0800

i386: write IO APIC irq routing entries in correct order

Since the mask bit is in the low word, when we write a new entry, we
need to write the high word first, before we potentially unmask it.

The exception is when we actually want to mask the interrupt, in which
case we want to write the low word first to make sure that the high word
doesn't change while the interrupt routing is still active.

Signed-off-by: Linus Torvalds [EMAIL PROTECTED]

diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
index eb10bd5..507983c 100644
--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -147,12 +147,34 @@ static struct IO_APIC_route_entry ioapic
return eu.entry;
 }
 
+/*
+ * When we write a new IO APIC routing entry, we need to write the high
+ * word first! If the mask bit in the low word is clear, we will enable
+ * the interrupt, and we need to make sure the entry is fully populated
+ * before that happens.
+ */
 static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
 {
unsigned long flags;
union entry_union eu;
eu.entry = e;
spin_lock_irqsave(ioapic_lock, flags);
+   io_apic_write(apic, 0x11 + 2*pin, eu.w2);
+   io_apic_write(apic, 0x10 + 2*pin, eu.w1);
+   spin_unlock_irqrestore(ioapic_lock, flags);
+}
+
+/*
+ * When we mask an IO APIC routing entry, we need to write the low
+ * word first, in order to set the mask bit before we change the
+ * high bits!
+ */
+static void ioapic_mask_entry(int apic, int pin)
+{
+   unsigned long flags;
+   union entry_union eu = { .entry.mask = 1 };
+
+   spin_lock_irqsave(ioapic_lock, flags);
io_apic_write(apic, 0x10 + 2*pin, eu.w1);
io_apic_write(apic, 0x11 + 2*pin, eu.w2);
spin_unlock_irqrestore(ioapic_lock, flags);
@@ -274,9 +296,7 @@ static void clear_IO_APIC_pin(unsigned i
/*
 * Disable it in the IO-APIC irq-routing table:
 */
-   memset(entry, 0, sizeof(entry));
-   entry.mask = 1;
-   ioapic_write_entry(apic, pin, entry);
+   ioapic_mask_entry(apic, pin);
 }
 
 static void clear_IO_APIC (void)
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.19-rc - ThinkPads

2006-11-01 Thread Linus Torvalds


On Wed, 1 Nov 2006, Michael S. Tsirkin wrote:

  I've pushed out the changes, but here is the part that may or may not 
  matter for anybody who wants to test it if they don't use git or if it 
  hasn't mirrored out yet. Michael? Martin?
 
 I pulled the latest git, and seems to work for me, thanks.
 This still could be a false negative (happened already) so I'll
 continue using this, and will post the results.

Ok, thanks.

  Somebody should look into that case. Does anybody feel like they want to 
  learn more about the IO-APIC? Halloween is over and gone, but if you want 
  to scare small children _next_ year, telling them about the IO-APIC is 
  likely a good strategy.
 
 Hmm, sounds interesting :)
 Is this a good place to start (I'm feeling lucky hit for IO-APIC)?
 http://www.intel.com/design/chipsets/datashts/290566.htm

Yeah, that's the datasheet. Note that a lot of the IO-APIC complexity is 
not so much in the programming interfaces themselves, but in keeping track 
of how the heck the thing is connected (ie ExtINT vs SCI vs normal apic 
interrupt etc).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.19-rc - ThinkPads

2006-10-31 Thread Linus Torvalds


On Wed, 1 Nov 2006, Ernst Herzberg wrote:
 
 still bisecting, will report the result.

Figuring out what caused an apparent change of behaviour is definitely a 
good idea - it might give us some clue to what really is going on.

(Or it might not. Sometimes the patch that triggers changes really doesn't 
seem to have anything to do with anything, and it literally was just a 
latent bug that just happened to be exposed by something that had nothing 
to do with anything at all but perhaps timing. But that's pretty rare in 
the end. It happens, but it's definitely not the common case at all, and I 
think it's great that you're bisecting even if there is a possibility 
that we'll be left with a big Huh? Whaa? as the end result ;^)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.19-rc - ThinkPads

2006-10-31 Thread Linus Torvalds


On Wed, 1 Nov 2006, Michael S. Tsirkin wrote:
 
 I've been bisecting ACPI/suspend thinkpad issue myself and I seem to get
 eea0e11c1f0d6ef89e64182b2f1223a4ca2b74a2 good,
 cf4c6a2f27f5db810b69dcb1da7f194489e8ff88 bad.

Very interesting..

That commit cf4c6a2f on the face of it looks like an obvious cleanup, but 
looking closer, it actually changes the order of the apic writes in many 
cases (high word first - low word first).

It also does something else that looks really really wrong: it turns an 
atomic update ioapic and set irq-info into two separate events, where 
interrupts can happen in between. Same goes for resume (instead of 
atomically changing all entries with the ioapic lock held, it now does 
them individually, and locks them individually).

I wonder if the order matters more, though. Andi? We _used_ to write the 
high word first, and I think the order matters. The low word contains the 
enable bit, for example, so when enabling an interrupt, you should write 
the low word last, when you disable it you should write the low word 
first.

And that's exactly what we _used_ to do, and it's what that particular 
commit totally and utterly _broke_.

I think that commit should either be reverted, or the code should be fixed 
to do the writes in the proper order. 

I suspect reverting it is the right thing to do - the patch only 
introduces bugs, an doesn't actually _fix_ anything, it just cleans 
things up.

Andi, you need to be a hell of a lot more careful! Apparently x86-64 is 
also totally broken in this regard, because somebody didn't realize that 
the ordering _matters_.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.19-rc3: known unfixed regressions (v3)

2006-10-30 Thread Linus Torvalds


On Mon, 30 Oct 2006, Jun'ichi Nomura wrote:
 
 Please revert the patch. I'll fix the wrong error handling.
 
 I'm not sure reverting the patch solves the ACPI problem
 because Michael's kernel seems not having any user of
 bd_claim_by_kobject.

Yeah, doing a grep does seem to imply that there is no way that those 
changes could matter.

Michael, can you double-check? I think Jun'ichi is right - in your kernel, 
according to the config posted on bugzilla, I don't think there should be 
a single caller of bd_claim_by_disk, since CONFIG_MD is disabled.

So it does seem strange. But if you bisected to that patch, and it 
reliably does _not_ have problems with the patch reverted, maybe there is 
some strange preprocessor thing that makes grep not find the caller.

Michael, you also reported:

 Reset to d7dd8fd9557840162b724a8ac1366dd78a12dff seems to hide part of 
 the issue (I have ACPI after kernel build, but not after 
 suspend/resume).  Both reverting this patch, and reset to the parent of
 this patch seem to solve (or at least, hide) both problems for me (no 
 ACPI after suspend/resume and no ACPI after kernel build).

(where that d7dd8f.. is actually missing the initial 4 - I think you 
cut-and-pasted things incorrectly). 

So I wonder.. You still had ACPI working _after_ the kernel build even 
with that patch in place, and it seems that suspend/resume is the real 
issue. Martin Lorenz reports on the same bugzilla entry, and he only has 
problems with suspend/resume.

I assume that compile the kernel just triggers some magic ACPI event 
(probably fan-related due to heat), and I wonder if the bisection faked 
you out because once you get close enough the differences are small 
enough that the kernel compile is quick and the heat event doesn't 
actually trigger?

See what I'm saying? Maybe the act of bisecting itself changed the 
results, and then when you just revert the patch, you end up in the same 
situation: you only recompile a small part (you only recompile that 
particular file), and the problem doesn't occur, so you'd think that the 
revert fixed it.

If it's heat-related, it should probably trigger by anything that does a 
lot of CPU (and perhaps disk) accesses, not just kernel builds. It might 
be good to try to find another test-case for it than a kernel recompile, 
one that doesn't depend on how much changed in the kernel..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Oops on boot (probably ACPI related)

2006-09-27 Thread Linus Torvalds


On Wed, 27 Sep 2006, Linus Torvalds wrote:
 
 On Wed, 27 Sep 2006, Andi Kleen wrote:
  
  I expect this patch to fix it.
 
 Andrew, Kyle, can you verify?

Not that it really matters. Andi sure as hell pinpointed a real problem 
with the new and broken inline asm. That's almost certainly the bug that 
crept in during the recent rewrite.

HOWEVER, now that I look more closely at the rewrite, I'm really wondering 
whether the rewrite was worth it at all. It generates smaller code, but at 
the expense of

 - the actual cache-footprint is bigger
 - the branch will now be mis-predicted by default

Since the smaller code really only tends to matter from a cache 
usage standpoint, I don't know if I'm at all convinced.

The fact that rewinders have problems is fairly immaterial. Maybe we 
should just take this as a hint that all the stupid rewinding code was 
wrong in the first place, and we should stop doing that? We can go back 
to just printing out our stacktrace guesses, that has worked for us for a 
long time, and the stack unwinding simply looks _fundamentally_ flawed.

So I have a real urge to just revert that change anyway.

Are there any _real_ advantages to this broken unwinding code that has had 
more bugs that Windows XP?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Oops on boot (probably ACPI related)

2006-09-27 Thread Linus Torvalds


On Wed, 27 Sep 2006, Andi Kleen wrote:
 
 It doesn't matter much because these days this stuff is all out of lined
 anyways and in a single function. And the dynamic branch predictor
 in all modern CPUs will usually cache the decision (unlocked) there.

Ahh, good point. Once there's only one copy, the branch predictor will get 
it right (and the code size won't much matter)

  Are there any _real_ advantages to this broken unwinding code that has had 
  more bugs that Windows XP?
 
 I thought for a long time we didn't need it either, but these days with all 
 these callbacks in some parts of the kernel (driver model, others) and you 
 get a oops with 60+ entries it is just too much trouble to figure it out 
 manually.

Ok, fair enough. I'll apply your fix (which in itself is obviously 
correct).

I just wanted to bring up the possibility that we should just remove the 
(fragile) unwinder.

But let's leave it for another day, if it keeps being problematic.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: kacpi_notify?

2006-07-13 Thread Linus Torvalds


On Thu, 13 Jul 2006, Starikovskiy, Alexey Y wrote:
 
 I'm terribly sorry that my patch broke on your machine.
 May I ask you to send me or attach to #5534 output of acpidump from this
 machine?

I'll send it in another email, since I already generated it for Len ;)

 Do you think that the whole idea is crap, or if I limit number of
 possible spawned threads and forsibly put current thread to sleep (which
 will release ACPICA executer mutex), as it happens in DSDT of nx6125 it
 will be possible to use it?

I don't think the _idea_ is crap per se, but it would at a minimum need a 
thread limit. But I think it's the wrong approach: especially if you put 
the current thread to sleep, you really don't want another thread at all, 
you are really just working around a problem that is totally internal to 
acpi (and the AML interpreter in particular).

So I think the problem really lies elsewhere, and that the whole thread 
approach was trying to paper over it. And having a limited set of threads 
is probably potentially _worse_ then what we have now.

Is there no way to have the AML interpreter have some state, and just push 
that current interrupted state back onto the event queue, and just start 
executing the new one instead? That sounds like it should fix the _real_ 
problem - a kind of mini-scheduler for ACPI events?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kacpi_notify?

2006-07-12 Thread Linus Torvalds

Hmm.

What's up with this?

 2341 ?D 0:00 [kacpid_notify]
 2342 ?D 0:00 [kacpid_notify]
 2343 ?D 0:00 [kacpid_notify]
 2344 ?D 0:00 [kacpid_notify]
 2345 ?D 0:00 [kacpid_notify]
 2346 ?D 0:00 [kacpid_notify]
 2347 ?D 0:00 [kacpid_notify]
 ...

(apparently about 300 of those processes, at which point the machine just 
hangs, because even root cannot start any new processes, and I couldn't 
actually get to debug this at all).

What would it be waiting on, and why?

This machine doesn't have any module support (at all), and I haven't 
booted a new kernel on it in quite a while, so this isn't necessarily new 
behaviour, but the last kernel I tried (which did _not_ have this problem, 
obviously) was in April (commit 6e5882cfa24e1456702e463f6920fc0ca3c3d2b8, 
to be exact).

Now, that's 6000+ commits ago, so I'd rather not even bisect this, if 
somebody can come up with a more obvious explanation of why kacpid_notify 
would be started over and over and over again, only to always get stuck..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kacpi_notify?

2006-07-12 Thread Linus Torvalds


On Wed, 12 Jul 2006, Linus Torvalds wrote:
 
 (apparently about 300 of those processes, at which point the machine just 
 hangs, because even root cannot start any new processes, and I couldn't 
 actually get to debug this at all).

With ACPI debugging, I notice that it finally dies due to ACPI Error 
AE_NO_MEMORY. Which I guess is just due to thousands of kacpi_notify 
processes, and tons of allocations.

With ctrl+scrolllock, I finally got something. The traceback for the 
D-state (millions and millions of them) is

__down_failed
acpi_ut_acquire_mutex
acpi_ex_enter_interpreter
acpi_ns_evaluate
acpi_evaluate_object
acpi_evaluate_integer
acpi_os_execute_thread
acpi_thermal_get_temperature
acpi_thermal_check
..

and 'kacpid' seems to be stuck using all CPU time, with the thing doing 
something like:

EIP is at delay_tsc+0xb/0x13
 EFLAGS: 0283Not tainted  (2.6.18-rc1-g155dbfd8 #24)
EAX: 4aa48900 EBX: 00026be1 ECX: 4aa40b7e EDX: 001a
ESI:  EDI: c039300d EBP: c0390df3 DS: 007b ES: 007b
CR0: 8005003b CR2: 080516f0 CR3: 362dc000 CR4: 06d0
 [c01c94c0] __delay+0x6/0x7
 [c01f23ef] acpi_os_stall+0x1d/0x29
 [c0201f11] acpi_ex_system_do_stall+0x37/0x3b
 [c0200fca] acpi_ex_opcode_1A_0T_0R+0x85/0xc8
 [c01f5308] acpi_ds_exec_end_op+0x133/0x553
 [c020d0f3] acpi_ps_parse_loop+0x777/0xbe0
 [c020c488] acpi_ps_parse_aml+0xd8/0x2d5
 [c020dbbe] acpi_ps_execute_pass+0xa9/0xd2
 [c020dd6a] acpi_ps_execute_method+0x153/0x231
 [c02095e1] acpi_ns_evaluate+0x179/0x24c
 [c01fc12e] acpi_ev_asynch_execute_gpe_method+0xeb/0x159
 [c01f2083] acpi_os_execute_deferred+0x19/0x21
 [c01226a0] run_workqueue+0x68/0x95
 [c01f206a] acpi_os_execute_deferred+0x0/0x21
 [c0122b2e] worker_thread+0xf9/0x12b
 [c03570bf] schedule+0x469/0x4cc
 [c0113bfb] default_wake_function+0x0/0xc
 [c0122a35] worker_thread+0x0/0x12b
 [c01249bb] kthread+0xad/0xd8
 [c012490e] kthread+0x0/0xd8
 [c0101005] kernel_thread_helper+0x5/0xb

which I assume is the thing that holds the AML semaphore, and isn't 
releasing it.

Is there any sane debugging info I can send people?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: kacpi_notify?

2006-07-12 Thread Linus Torvalds


On Wed, 12 Jul 2006, Linus Torvalds wrote:
 
 I've got a hundred-odd commits to go, but the next bisection test happens 
 to be the parent of my merge (your merge linus into release branch 
 merge: ae6c859b7dcd708efadf1c76279c33db213e3506), so if I'm right, I'd 
 expect that to be a bad tree.

Yup.

And yes, the problem seems to co-incide with getting about 300 acpi 
interrupts per second. After about 9500 interrupts (each of which seems to 
create one of these things), the machine is basically dead.

Ten thousand kacpid_notify threads is too much. Regardless of what brought 
on this bug, I think there's something wrong in anything that keeps on 
notifying things without keeping track of how many outstanding 
notifications it already has.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: kacpi_notify?

2006-07-12 Thread Linus Torvalds


On Wed, 12 Jul 2006, Brown, Len wrote:

 Likely related to bugzilla-5534
 
 b8d35192c55fb055792ff0641408eaaec7c88988

Well, that one certainly looks likely.

Any reason to not just revert it? The fundamental problems that it 
introduces are obviously much worse than the fix.

Linus 
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 5534] No thermal events until acpi -t - HP nx6125

2006-07-12 Thread Linus Torvalds

On Wed, 12 Jul 2006, Linus Torvalds wrote:
 
 Any reason to not just revert it? The fundamental problems that it 
 introduces are obviously much worse than the fix.

Ok, that commit b8d35192c55fb055792ff0641408eaaec7c88988 is definitely 
horribly horribly broken.

I'm going to revert it, because the fix is much worse than the problem 
it fixes. Instead of a fan not coming on, I now have ten thousand threads 
killing the machine instead - and the fan _still_ doesn't come on..

The thread approach doesn't even fix the fundamental problem itself. It 
doesn't help to start a new thread, when the AML interpreter holds a 
semaphore over the sleep, causing the events to be serialized, and the 
thermal events to be delayed _anyway_.

The only thing the threading causes is that it guarantees that the machine 
ends up being totally overwhelmed by the thousands of threads, all blocked 
on the same semaphore.

I don't know what the solution should be, but in the meantime, the fix 
is definitely unacceptable.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ACPI_DOCK bug: noone cares

2006-07-09 Thread Linus Torvalds


On Sun, 9 Jul 2006, Brown, Len wrote:
 
 Fair enough. Reverted.
 
 I disagree with this decision, and would like to know what
 is necessary to reverse it.

Mistakes happen. Fair enough. They happen all the time. This time around, 
for the 2.6.18-rc1 thing, I had heard more than the usual nobody even 
reacted, as Andrew had held up two patch-series of his because of that 
issue..

So that makes me like it even less than usual when I'm told that a problem 
with something I merged was apparently known BEFORE IT WAS MERGED.

So Adrian's report on its own wouldn't have caused a revert. 

 If you address me directly when you are asking me to do something,
 that would really help me help you.

As far as I can tell, you were cc'd on all of these things, along with 
the linux-acpi mailing list.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ACPI_DOCK bug: noone cares

2006-07-09 Thread Linus Torvalds


On Sun, 9 Jul 2006, Brown, Len wrote:
 
 So I ask you.  If I fix the Kconfig issue today, will you accept
 a push that restores this driver to 2.6.18?

Sure.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Resume on Apple Mac Mini

2006-06-17 Thread Linus Torvalds

It turns out that the Apple Mac Mini comes back from suspend in legacy 
mode.

The simplest and most obvious fix would seem to be the following trivial 
one-liner, which just enables ACPI mode after any suspend event.

Comments? Can anybody see any downsides to just doing something this 
obvious, and doing it unconditionally?  I've seen a much more complicated 
patch that only does this for Apple hardware, but I wouldn't be surprised 
at all if this is more common.

In fact, googling for the symptoms (nobody cared and acpi_irq and 
suspend) gives quite a number of hits, so I would not be surprised at 
all if this is not Apple-related at all, but that we should always have 
done this.

Comments? 

Linus

---
diff --git a/drivers/acpi/sleep/main.c b/drivers/acpi/sleep/main.c
index 56f861e..7b6c146 100644
--- a/drivers/acpi/sleep/main.c
+++ b/drivers/acpi/sleep/main.c
@@ -109,6 +108,11 @@ static int acpi_pm_enter(suspend_state_t
local_irq_restore(flags);
printk(KERN_DEBUG Back to C!\n);
 
+   /*
+* Make sure we're back in ACPI mode!
+*/
+   acpi_enable();  
+
/* restore processor state
 * We should only be here if we're coming back from STR or STD.
 * And, in the case of the latter, the memory image should have already
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resume on Apple Mac Mini

2006-06-17 Thread Linus Torvalds


On Sat, 17 Jun 2006, Linus Torvalds wrote:

 It turns out that the Apple Mac Mini comes back from suspend in legacy 
 mode.

Actually, that wasn't it. My debugging patches use the RTC to save off 
information, and that causing a test-kernel to not compile the test, or 
just normal incompetence on my part, made me confirm this as working, 
when it isn't.

The thing that does work is to literally just set the SCI_ENABLE bit:

acpi_set_register(ACPI_BITREG_SCI_ENABLE, 1, ACPI_MTX_DO_NOT_LOCK);

after resume, _not_ the acpi_enable().

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: git pull on Linux/ACPI release tree

2006-01-09 Thread Linus Torvalds


On Mon, 9 Jan 2006, Linus Torvalds wrote:

 One thing we could do is to make it easier to apply a patch to a 
 _non_current_ branch.
   [ ... ]
 Do you think that kind of workflow would be more palatable to you? It 
 shouldn't be /that/ hard to make git-apply branch-aware... (It was part of 
 my original plan, but it is more work than just using the working 
 directory, so I never finished the thought).

Btw, this is true in a bigger sense: the things git does have largely 
been driven by user needs. Initially mainly mine, but things like 
git-rebase were from people who wanted to work as sub-maintainers (eg 
Junio before he became the head honcho for git itself).

But if there are workflow problems, let's try to fix them. The apply 
patches directly to another branch suggestion may not be sane (maybe it's 
too confusing to apply a patch and not actually see it in the working 
tree), but workflow suggestions in general are appreciated.

We've made switching branches about as efficient as it can be (but if the 
differences are huge, the cost of re-writing the working directory is 
never going to be low). But switching branches has the confusion factor 
(ie you forget which branch you're on, and apply a patch to your working 
branch instead of your development branch), so maybe there are other ways 
of doing the same thing that might be sensible..

So send suggestions to the git lists. Maybe they're insane and can't be 
done, but while I designed git to work with _my_ case (ie mostly merging 
tons of different trees and then having occasional big batches of 
patches), it's certainly _supposed_ to support other maintainers too..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: git pull on Linux/ACPI release tree

2006-01-09 Thread Linus Torvalds


On Mon, 9 Jan 2006, Luben Tuikov wrote:
 
 Yes.  Ever since I started used git, I never used branch
 switching, but I do have git branches and I do use git branching.
 
 I basically have a branch per directory, whereby the object db
 is shared as is remotes/refs/etc, HEAD and index are not shared
 of course.
 
 This allows me to do a simple and fast cd to change/go to a
 different branch, since they are in different directories.
 So the time I wait to switch branches is the time the filesystem
 takes to do a cd.
 
 This also allows me to build/test/patch/work on branches
 simultaneously.

Yes. It has many advantages, and it's the approach I pushed pretty hard 
originally, but the many branches in the same tree approach seems to 
have become the more common one. Using many branches in the same tree is 
definitely the better approach for _distribution_, but that doesn't 
necessarily mean that it's the better one for development.

For example, you can have a git distribution tree with 20 different 
branches on kernel.org, but do development in 20 different trees with just 
one branch active - and when you do a git push to push out your branch 
in your development tree, it just updates that one branch on the 
distribution site.

So git certainly supports that kind of behaviour, but nobody I know 
actually does it that way (not even me, but since I tend to just merge 
other peoples code, I don't actually have multiple branches: I create 
temporary branches for one-off things, but don't maintain them that way).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git pull on Linux/ACPI release tree

2006-01-08 Thread Linus Torvalds


On Mon, 9 Jan 2006, Martin Langhoff wrote:
 
 I think it does. All the tricky stuff that David and Junio have been
 discussing is actually done very transparently by
 
 git-rebase upstream

Yes, it's fairly easy to do. That said, I would actually discourage it. I 
haven't said anything to David, because he is obviously very comfy with 
the git usage, and it _does_ result in cleaner trees, so especially since 
the networking code ends up being the source of a lot of changes, the 
extra cleanup stage that David does might actually be worth it for that 
case.

But git is actually designed to have parallel development, and what David 
does is to basically artificially linearize it. We merge between us often 
enough that it doesn't really end up losing any historical information 
(since David can't linearize the stuff that we already merged), but in 
_theory_ what David does actually does remove the historical context.

So git-rebase is a tool that is designed to allow maintainers to (as the 
command says) rebase their own development and re-linearize it, so that 
they don't see the real history. It's basically the reverse of what Len is 
doing - Len mixes up his history with other peoples history in order to 
keep them in sync, while David bassically re-does his history to be on 
top of mine (to keep it _separate_).

The git-rebase means that David will always see the development he has 
done/merged as being on top of whatever my most recent tree is. It's 
actually a bit scary, because if something goes wrong when David re-bases 
things, he'll have to clean things up by hand, and git won't help him 
much, but hey, it works for him because (a) things seldom go wrong and (b) 
he appears so comfortable with the tool that he _can_ fix things up when 
they do go wrong.

And yes, git-rebase can be very convenient. It has some problems too 
(which is the other reason I don't try to convince other maintainers to 
use it): because it re-writes history, a change that _might_ have worked 
in its original place in history might no longer work after a rebase if it 
depended on something subtle that used to be true but no longer is in the 
new place that it has been rebased to.

Which just means that a commit that was tested and found to be working 
might suddenly not work any more, which can be very surprising (But I 
didn't change anything!).

On the other hand, this is no different from doing a merge of two 
independent streams of development, and getting a new bug that didn't 
exist in either of the two, just because they changed the assumptions of 
each other (ie not a _mismerge_, but simply two developers changing 
something that the other depended on it, and the bug only appears when 
both the working trees are merged and the end result no longer works).

So my suggested git usage is to _not_ play games. Neither do too-frequent
merges _nor_ play games with git-rebase.

That said, git-rebase (and associated tools like git-cherry-pick etc) 
can be a very powerful tool, especially if you've screwed something up, 
and want to clean things up. Re-doing history because you realized that a 
you did something stupid that you don't want to admit to anybody else.

So trying out git-rebase and git-cherry-pick just in case you decide to 
want to use them might be worthwhile. Making it part of your daily routine 
like David has done? Somewhat questionable, but hey, it seems to be 
working for David, and it does make some things much easier, so..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git pull on Linux/ACPI release tree

2006-01-08 Thread Linus Torvalds


On Mon, 9 Jan 2006, Al Viro wrote:
 
 How do you deal with conflict resolution?  That's a genuine question -
 I'm not talking about deliberate misuse to hide an attack, just a normal
 situation when you have to resolve something like
 
 A:
   if (foo)
   bar
 
 B:
   if (foo  baz)
   bar
 
 A':
 #ifdef X
   if (foo)
   bar
   ...
 #endif
 
 merge of A' and B: trivial conflict

Actually, these days git is pretty good at it. Much better than CVS, 
certainly. You can see the conflict against my old tree, or conflict 
against the remote tree by using the --ours or --theirs flag to git 
diff respectively.

(Or diff conflict against common base: git diff --base).

So for your particular example with a trivial base file:

line1
2
3
if (foo)
bar
6
7
8

and then the changes you had as an example in the A' and B branches, if I 
from A' do a git pull . B, I get:

Trying really trivial in-index merge...
fatal: Merge requires file-level merging
Nope.
Merging HEAD with ad56343c578785b8d932224a8676615e7a3e191f
Merging: 
9d619225e3adecee6432a36d67d140e29b0acf62 A' case 
ad56343c578785b8d932224a8676615e7a3e191f B: case 
found 1 common ancestor(s): 
93765ba3f64e9c73438e52683fffa68e5a493df7 Base commit 
Auto-merging A 
CONFLICT (content): Merge conflict in A 

Automatic merge failed; fix up by hand

and then the file contains the contents

line1
2
3
 HEAD/A
#ifdef X
if (foo)
===
if (foo  baz)
 ad56343c578785b8d932224a8676615e7a3e191f/A
bar
6
#endif
7
8

ie it will have does a CVS-like merge for me, and I need to fix this up. 
However, to _help_ me fix it up, I can now see what the diff is aganst my 
original version (A'), with git diff --ours (the --ours is default, so 
it's unnecessary, but just to make it explicit):

* Unmerged path A
diff --git a/A b/A
index 06dd3bc..7334364 100644
--- a/A
+++ b/A
@@ -1,8 +1,12 @@
 line   1
2
3
+ HEAD/A
 #ifdef X
if (foo)
+===
+   if (foo  baz)
+ ad56343c578785b8d932224a8676615e7a3e191f/A
bar
6
 #endif

which is very helpful especially once I have resolved it. IOW, I just edit 
the file and do the trivial resolve, and now I can do a git diff again 
to make sure that it looks ok:

* Unmerged path A
diff --git a/A b/A
index 06dd3bc..924fc97 100644
--- a/A
+++ b/A
@@ -2,7 +2,7 @@ line1
2
3
 #ifdef X
-   if (foo)
+   if (foo  baz)
bar
6
 #endif

ahh, looks good, so I just do git commit A and that creates the 
resolved merge.

  The obvious way (edit file in question, update-index, commit) will not 
 only leave zero information about said conflict and actions needed to 
 deal with it, but will lead to situation when git whatchanged will not 
 show anything useful.

Now, this is a real issue. 

The resolve part is pretty easy, but the fact that it's hard to see in 
git-whatchanged is a limitation of git-whatchanged. 

You need to use gitk, which _does_ know how to show merges as a diff 
(and yes, I just checked).

 Is there any SOP I'm missing here?

You're just missing the fact that git-whatchanged (or rather, 
git-diff-tree) isn't smart enough to show merges nicely. It really 
_should_. It doesn't. You can choose to show merges with the -m flag, 
but that will show diffs against each parent, which really isn't what you 
want. 

I should do the same thing gitk does in git-diff-tree.

 Worse (for my use), format-patch on such tree will not give a useful patchset.
 You get a series of patches that won't apply to _any_ tree. 

Now, git-diff-tree _does_ do that. Use the -m flag, and choose the tree 
you want.

And btw, that works with git-whatchanged too. You _can_ pass the -m 
flag to git-whatchanged, and it will show you each side of the merge 
correctly. So it _works_. It's just such a horrible format that by default 
it prefers to shut up about merges entirely. 

(I don't know of a good three-way diff format. gitk can do it, because 
gitk can show colors. That's a big deal when you do three-way - or 
more-way - diffs).

 And that's a fundamental problem behind all that rebase activity, AFAICS.

You do _not_ want to rebase a merge. It not only won't work, it's against 
the whole point of rebasing.

Rebasing is really only a valid operation when you have a few patches OF