Re: 2.6.22 regression: thermal trip points
Hi! > > > > For the > > > > upstream kernel, I think it is more appropriate to expose and fix > > > > the fundamental problems. For distro kernels, I'm less concerned > > > > if you hide bugs instead of fixing them. > > > > > > This is okay as long as you are willing to work around the fundamental > > > problems in kernel. You are unable to _fix_ them. They are broken > > > BIOSes. > > > > The thing Linux needs to figure out is why Windows doesn't > > get confused by what Linux claims to be broken BIOS. > > Why do you assume that Windows work? Yes, they probably will not have > 'machine runs at 50% speed' problem, but I'd be very surprised if > critical shutdown worked properly on more than 90% of notebooks > > > So far I have one live sighting to be addressed by > > the upstream kernel (from Knut). I'm certainly looking > > forward to the 2nd live sighting... > > Ok, I guess I should steal that old xe3 I was talking about... Done, xe3 was re-built from parts. /proc/acpi/.../trip_points: critical (S5): 100 C passive:83 C... active[0]: 100 C... (hmm, active=critical? Interesting. Fortunately fan seems to be driven by BIOS). Temperature is ~63 C in "normal" use. Now lets simulate fan failure... and lets load the cpu... temperature slowly rises, 1min00 -- 72C, 1min15 -- 75C, 1min30 -- 77C, 1min45 -- 80C, 1min00 -- 82C, 1min15 -- 83C, 1min45 -- sudden powerdown, presumably because of hardware failsafe. So we have two bugs here: machine should have attempted to use passive cooling sooner, so that critical temperature would not be reached, and machine should have attempted shutdown before hardware failsafe killed the power. I could do both in 2.6.21, with echo of new trip points and enable of polling. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Hi! For the upstream kernel, I think it is more appropriate to expose and fix the fundamental problems. For distro kernels, I'm less concerned if you hide bugs instead of fixing them. This is okay as long as you are willing to work around the fundamental problems in kernel. You are unable to _fix_ them. They are broken BIOSes. The thing Linux needs to figure out is why Windows doesn't get confused by what Linux claims to be broken BIOS. Why do you assume that Windows work? Yes, they probably will not have 'machine runs at 50% speed' problem, but I'd be very surprised if critical shutdown worked properly on more than 90% of notebooks So far I have one live sighting to be addressed by the upstream kernel (from Knut). I'm certainly looking forward to the 2nd live sighting... Ok, I guess I should steal that old xe3 I was talking about... Done, xe3 was re-built from parts. /proc/acpi/.../trip_points: critical (S5): 100 C passive:83 C... active[0]: 100 C... (hmm, active=critical? Interesting. Fortunately fan seems to be driven by BIOS). Temperature is ~63 C in normal use. Now lets simulate fan failure... and lets load the cpu... temperature slowly rises, 1min00 -- 72C, 1min15 -- 75C, 1min30 -- 77C, 1min45 -- 80C, 1min00 -- 82C, 1min15 -- 83C, 1min45 -- sudden powerdown, presumably because of hardware failsafe. So we have two bugs here: machine should have attempted to use passive cooling sooner, so that critical temperature would not be reached, and machine should have attempted shutdown before hardware failsafe killed the power. I could do both in 2.6.21, with echo of new trip points and enable of polling. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Tue 2007-08-07 14:58:45, Len Brown wrote: > On Monday 06 August 2007 05:55, Pavel Machek wrote: > > > For the > > > upstream kernel, I think it is more appropriate to expose and fix > > > the fundamental problems. For distro kernels, I'm less concerned > > > if you hide bugs instead of fixing them. > > > > This is okay as long as you are willing to work around the fundamental > > problems in kernel. You are unable to _fix_ them. They are broken > > BIOSes. > > The thing Linux needs to figure out is why Windows doesn't > get confused by what Linux claims to be broken BIOS. Why do you assume that Windows work? Yes, they probably will not have 'machine runs at 50% speed' problem, but I'd be very surprised if critical shutdown worked properly on more than 90% of notebooks > So far I have one live sighting to be addressed by > the upstream kernel (from Knut). I'm certainly looking > forward to the 2nd live sighting... Ok, I guess I should steal that old xe3 I was talking about... Vojtech, could I have that machine from table football room for a few experiments? I keep using it as counterexample. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Monday 06 August 2007 05:55, Pavel Machek wrote: > > For the > > upstream kernel, I think it is more appropriate to expose and fix > > the fundamental problems. For distro kernels, I'm less concerned > > if you hide bugs instead of fixing them. > > This is okay as long as you are willing to work around the fundamental > problems in kernel. You are unable to _fix_ them. They are broken > BIOSes. The thing Linux needs to figure out is why Windows doesn't get confused by what Linux claims to be broken BIOS. So far I have one live sighting to be addressed by the upstream kernel (from Knut). I'm certainly looking forward to the 2nd live sighting... -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Monday 06 August 2007 05:55, Pavel Machek wrote: For the upstream kernel, I think it is more appropriate to expose and fix the fundamental problems. For distro kernels, I'm less concerned if you hide bugs instead of fixing them. This is okay as long as you are willing to work around the fundamental problems in kernel. You are unable to _fix_ them. They are broken BIOSes. The thing Linux needs to figure out is why Windows doesn't get confused by what Linux claims to be broken BIOS. So far I have one live sighting to be addressed by the upstream kernel (from Knut). I'm certainly looking forward to the 2nd live sighting... -Len - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Tue 2007-08-07 14:58:45, Len Brown wrote: On Monday 06 August 2007 05:55, Pavel Machek wrote: For the upstream kernel, I think it is more appropriate to expose and fix the fundamental problems. For distro kernels, I'm less concerned if you hide bugs instead of fixing them. This is okay as long as you are willing to work around the fundamental problems in kernel. You are unable to _fix_ them. They are broken BIOSes. The thing Linux needs to figure out is why Windows doesn't get confused by what Linux claims to be broken BIOS. Why do you assume that Windows work? Yes, they probably will not have 'machine runs at 50% speed' problem, but I'd be very surprised if critical shutdown worked properly on more than 90% of notebooks So far I have one live sighting to be addressed by the upstream kernel (from Knut). I'm certainly looking forward to the 2nd live sighting... Ok, I guess I should steal that old xe3 I was talking about... Vojtech, could I have that machine from table football room for a few experiments? I keep using it as counterexample. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Hi! > > If we have something like this, we could still discuss a config option, > > that also allows to increase trip points, marking it with "If you set > > this you can destroy your machine, you have been warned...". While this > > would not be an option for distributions to compile in, some people may > > come around the biggest hammer -> overriding DSDT. > > > > I cannot promise, but I try to get this for 2.6.24. > > I think if you are enamored with overriding trip points at SuSE, > that you should simply restore the original scheme as the "value add" > for SuSE kernels. Seriously, I'm totally fine with that. > > You should be aware, however, that (one of) the fundamental flaws > with that scheme, shared with what you describe above, is that the OS > can not actually change the trip points in the thermal sensor. > The sensor is going to trip at the temperature that _it_ thinks Yep, you work around this one by enabling polling. > This faking out the user, plus the fact that the BIOS does change > trip-points at run-time, made the original scheme fundamentally > unsound. Further, I've not yet found a single system where use Yes, this one is uglier. But maybe "enable polling automatically + ignore any updates from bios" (+ maybe "only enable lowering") is better solution than "just remove the knob"? After all, "the knob" is still useful for debugging at least. > of this scheme wasn't papering over some other problem. For the > upstream kernel, I think it is more appropriate to expose and fix > the fundamental problems. For distro kernels, I'm less concerned > if you hide bugs instead of fixing them. This is okay as long as you are willing to work around the fundamental problems in kernel. You are unable to _fix_ them. They are broken BIOSes. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Hi! If we have something like this, we could still discuss a config option, that also allows to increase trip points, marking it with If you set this you can destroy your machine, you have been warned While this would not be an option for distributions to compile in, some people may come around the biggest hammer - overriding DSDT. I cannot promise, but I try to get this for 2.6.24. I think if you are enamored with overriding trip points at SuSE, that you should simply restore the original scheme as the value add for SuSE kernels. Seriously, I'm totally fine with that. You should be aware, however, that (one of) the fundamental flaws with that scheme, shared with what you describe above, is that the OS can not actually change the trip points in the thermal sensor. The sensor is going to trip at the temperature that _it_ thinks Yep, you work around this one by enabling polling. This faking out the user, plus the fact that the BIOS does change trip-points at run-time, made the original scheme fundamentally unsound. Further, I've not yet found a single system where use Yes, this one is uglier. But maybe enable polling automatically + ignore any updates from bios (+ maybe only enable lowering) is better solution than just remove the knob? After all, the knob is still useful for debugging at least. of this scheme wasn't papering over some other problem. For the upstream kernel, I think it is more appropriate to expose and fix the fundamental problems. For distro kernels, I'm less concerned if you hide bugs instead of fixing them. This is okay as long as you are willing to work around the fundamental problems in kernel. You are unable to _fix_ them. They are broken BIOSes. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Friday 03 August 2007 07:16, Thomas Renninger wrote: > On Thu, 2007-08-02 at 20:38 +0200, Andi Kleen wrote: > > On Thu, Aug 02, 2007 at 03:57:54PM +, Pavel Machek wrote: > > > On Thu 2007-08-02 15:16:22, Andi Kleen wrote: > > > > On Thu, Aug 02, 2007 at 02:04:42PM +0100, Alan Cox wrote: > > > > > > > Set a taint flag, > > > > > > That's hardly any useful if the machine is dead afterwards. > > > > > > > > > > It won't be the hardware will do a failsafe shutdown first. > > > > > > > > Not necessarily. At SUSE we had at least one broken laptop > > > > with wrong trip points. The machine ran very hot for some time > > > > and afterwards the hard disk was dead. > > > > > > Yes, but it was original BIOS trip points that were wrong. And yes, > > > its failsafe shutdown was too late. At least lowering the trip points > > > would allow me to run it safely. > > > > I have no problem with lowering them (in fact I proposed this > > to Thomas as a possible solution at some point). Just rising > > is a bad idea. > > Ok. > If nobody screams (especially Len who has to accept this in the end, I > don't want to do work for nothing..), I'll try an implementation that: > - Allows lowering trip points > - If BIOS modifies trip points, the overridden ones might also > get lowered if they are even lower > - Allow the definition of a passive trip point (with some default > values for hysteresis), even if the thermal zone does not > provide one > > If we have something like this, we could still discuss a config option, > that also allows to increase trip points, marking it with "If you set > this you can destroy your machine, you have been warned...". While this > would not be an option for distributions to compile in, some people may > come around the biggest hammer -> overriding DSDT. > > I cannot promise, but I try to get this for 2.6.24. I think if you are enamored with overriding trip points at SuSE, that you should simply restore the original scheme as the "value add" for SuSE kernels. Seriously, I'm totally fine with that. You should be aware, however, that (one of) the fundamental flaws with that scheme, shared with what you describe above, is that the OS can not actually change the trip points in the thermal sensor. The sensor is going to trip at the temperature that _it_ thinks the trip point is at -- not the trip point that you are letting the user think it is at. Ie. what is advertised as a trip-point override actually defeats the entire concept of trip-points, and it is mandatory that you enable periodic polling of the current temperature to compare with your new thresholds to work-around that. This faking out the user, plus the fact that the BIOS does change trip-points at run-time, made the original scheme fundamentally unsound. Further, I've not yet found a single system where use of this scheme wasn't papering over some other problem. For the upstream kernel, I think it is more appropriate to expose and fix the fundamental problems. For distro kernels, I'm less concerned if you hide bugs instead of fixing them. We had quite a long discussion when I deleted the trip-point-override scheme in -mm. Then it rode through the entire 2.6.22 release cycle. However, I have yet to see a single bug report filed that has shown that Linux should be doing this, or something like it. I'm hopeful that Knut's or Adrian's will be the first -- but I'm still waiting. -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Friday 03 August 2007 07:43, Renato S. Yamane wrote: > Len Brown escreveu: > > On Thursday 02 August 2007 04:40, Knut Petersen wrote: > >> mainboard: AOpen i915GMm-hfs, AWARD BIOS > >> cpu: Pentium-M 750 (0.8 to 1.86 MHz) > >> openSuSE 10.2 with kernel 2.6.22.1 > >> > >> The cpu fan can not be controled by linux kernel. > >> The BIOS will switch on the cpu fan a bit above 50 deg. Celsius. > >> The active and passive trip points both are set to 50 deg. Celsius. > >> Temperature of the idle cpu at 800 Mhz: 34 to 42 deg. C. > >> The BIOS never changes the trip points. > >> Cpufreq does work perfectly. > > On my Toshiba M45-S355 (Toshiba Bios, Pentium M 750 - 0.8 at 1.86GHz, > Debian Etch) I see the same using Kernel 2.6.21.6 > > >> Previously there was the possibility to add something like > >> > >> echo "100:0:65:70:0" > /proc/acpi/thermal_zone/THRM/trip_points > >> echo 2 > /proc/acpi/thermal_zone/THRM/polling_frequency > >> echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > > I never do that, but see below (Kernel 2.6.21.6): > > cat /proc/acpi/thermal_zone/TZCL/trip_points > critical (S5): 105 C > > cat /proc/acpi/thermal_zone/TZCL/polling_frequency > polling frequency: 2 seconds > > cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > ondemand > Renato, I don't understand how your Toshiba is similar to Knut's Aopen. You've got a single critical trip point at 105C, but no active or passive trip points. Are you reporting some kind of failure? The only thing wrong with your system is that polling_frequency != 0 -- but that is probably a distro configuration issue rather than a kernel issue. thanks, -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Friday 03 August 2007 08:53, Knut Petersen wrote: > Len Brown : > > > > > > Thanks for the sighting, Knut! > > This regression is dramatic when put in the terms of 50% performance hit! > > I guess the good news is that thermal throttling is doing the job > > we are asking it to:-) > > > > > > > Thermal management by cpufreq is working really fine ;-) Unfortunately, I a lot of people don't understand that the ";-)" after this statement and they really think that cpufreq is a solution for thermal management. It isn't. Systems still need to be thermally sane when they are fully utilized and cpufreq helps not. > My problems are definitely not related to a linux bug. All trip_points > are fixed, hardcoded in the system BIOS at address 0x000FF810. > > Yes, I could hack and flash a custom BIOS. > > After reading a lot I think I even could fix the DSDT. No, you should never have to override your BIOS -- except for debugging. If Windows works out-of-the-box on this system, then Linux should too - even if we have to use a DMI-based workaround for a BIOS bug. I'm looking forward to seeing the bug report that you are going to file. Please include the dmidecode output in addition to the acpidump output. thanks, -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Len Brown : > > > Thanks for the sighting, Knut! > This regression is dramatic when put in the terms of 50% performance hit! > I guess the good news is that thermal throttling is doing the job > we are asking it to:-) > > > Thermal management by cpufreq is working really fine ;-) My problems are definitely not related to a linux bug. All trip_points are fixed, hardcoded in the system BIOS at address 0x000FF810. Yes, I could hack and flash a custom BIOS. After reading a lot I think I even could fix the DSDT. But all that would only be a solution for my system. The principal question is, if that hook that allowed to override unreasonable trip point definitions is too dangerous to be a part of the linux kernel. You and some others believed it should not be part of the kernel, and so it was eliminated a while ago. Some people want it back, either because - they need it desperately to allow their machines healthy operation, - they need it to restore performance of their machines, or - they want a really quiet system. Root should be allowed to smoke his system - ask him if he really wants to do so, ask him to echo "Yes, it´s me who is guilty" to some file prior to allow trip point changes, but do not eliminate hooks useful for the management of buggy machines from our kernel. We do need writable trip points again. And, Thomas, some people also need to raise the defaults. cu, Knut - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Len Brown escreveu: On Thursday 02 August 2007 04:40, Knut Petersen wrote: mainboard: AOpen i915GMm-hfs, AWARD BIOS cpu: Pentium-M 750 (0.8 to 1.86 MHz) openSuSE 10.2 with kernel 2.6.22.1 The cpu fan can not be controled by linux kernel. The BIOS will switch on the cpu fan a bit above 50 deg. Celsius. The active and passive trip points both are set to 50 deg. Celsius. Temperature of the idle cpu at 800 Mhz: 34 to 42 deg. C. The BIOS never changes the trip points. Cpufreq does work perfectly. On my Toshiba M45-S355 (Toshiba Bios, Pentium M 750 - 0.8 at 1.86GHz, Debian Etch) I see the same using Kernel 2.6.21.6 Previously there was the possibility to add something like echo "100:0:65:70:0" > /proc/acpi/thermal_zone/THRM/trip_points echo 2 > /proc/acpi/thermal_zone/THRM/polling_frequency echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor I never do that, but see below (Kernel 2.6.21.6): cat /proc/acpi/thermal_zone/TZCL/trip_points critical (S5): 105 C cat /proc/acpi/thermal_zone/TZCL/polling_frequency polling frequency: 2 seconds cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor ondemand Regards, Renato S. Yamane - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 2007-08-02 at 20:38 +0200, Andi Kleen wrote: > On Thu, Aug 02, 2007 at 03:57:54PM +, Pavel Machek wrote: > > On Thu 2007-08-02 15:16:22, Andi Kleen wrote: > > > On Thu, Aug 02, 2007 at 02:04:42PM +0100, Alan Cox wrote: > > > > > > Set a taint flag, > > > > > That's hardly any useful if the machine is dead afterwards. > > > > > > > > It won't be the hardware will do a failsafe shutdown first. > > > > > > Not necessarily. At SUSE we had at least one broken laptop > > > with wrong trip points. The machine ran very hot for some time > > > and afterwards the hard disk was dead. > > > > Yes, but it was original BIOS trip points that were wrong. And yes, > > its failsafe shutdown was too late. At least lowering the trip points > > would allow me to run it safely. > > I have no problem with lowering them (in fact I proposed this > to Thomas as a possible solution at some point). Just rising > is a bad idea. Ok. If nobody screams (especially Len who has to accept this in the end, I don't want to do work for nothing..), I'll try an implementation that: - Allows lowering trip points - If BIOS modifies trip points, the overridden ones might also get lowered if they are even lower - Allow the definition of a passive trip point (with some default values for hysteresis), even if the thermal zone does not provide one If we have something like this, we could still discuss a config option, that also allows to increase trip points, marking it with "If you set this you can destroy your machine, you have been warned...". While this would not be an option for distributions to compile in, some people may come around the biggest hammer -> overriding DSDT. I cannot promise, but I try to get this for 2.6.24. Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 2007-08-02 at 20:38 +0200, Andi Kleen wrote: On Thu, Aug 02, 2007 at 03:57:54PM +, Pavel Machek wrote: On Thu 2007-08-02 15:16:22, Andi Kleen wrote: On Thu, Aug 02, 2007 at 02:04:42PM +0100, Alan Cox wrote: Set a taint flag, That's hardly any useful if the machine is dead afterwards. It won't be the hardware will do a failsafe shutdown first. Not necessarily. At SUSE we had at least one broken laptop with wrong trip points. The machine ran very hot for some time and afterwards the hard disk was dead. Yes, but it was original BIOS trip points that were wrong. And yes, its failsafe shutdown was too late. At least lowering the trip points would allow me to run it safely. I have no problem with lowering them (in fact I proposed this to Thomas as a possible solution at some point). Just rising is a bad idea. Ok. If nobody screams (especially Len who has to accept this in the end, I don't want to do work for nothing..), I'll try an implementation that: - Allows lowering trip points - If BIOS modifies trip points, the overridden ones might also get lowered if they are even lower - Allow the definition of a passive trip point (with some default values for hysteresis), even if the thermal zone does not provide one If we have something like this, we could still discuss a config option, that also allows to increase trip points, marking it with If you set this you can destroy your machine, you have been warned While this would not be an option for distributions to compile in, some people may come around the biggest hammer - overriding DSDT. I cannot promise, but I try to get this for 2.6.24. Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Len Brown : Thanks for the sighting, Knut! This regression is dramatic when put in the terms of 50% performance hit! I guess the good news is that thermal throttling is doing the job we are asking it to:-) Thermal management by cpufreq is working really fine ;-) My problems are definitely not related to a linux bug. All trip_points are fixed, hardcoded in the system BIOS at address 0x000FF810. Yes, I could hack and flash a custom BIOS. After reading a lot I think I even could fix the DSDT. But all that would only be a solution for my system. The principal question is, if that hook that allowed to override unreasonable trip point definitions is too dangerous to be a part of the linux kernel. You and some others believed it should not be part of the kernel, and so it was eliminated a while ago. Some people want it back, either because - they need it desperately to allow their machines healthy operation, - they need it to restore performance of their machines, or - they want a really quiet system. Root should be allowed to smoke his system - ask him if he really wants to do so, ask him to echo Yes, it´s me who is guilty to some file prior to allow trip point changes, but do not eliminate hooks useful for the management of buggy machines from our kernel. We do need writable trip points again. And, Thomas, some people also need to raise the defaults. cu, Knut - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Len Brown escreveu: On Thursday 02 August 2007 04:40, Knut Petersen wrote: mainboard: AOpen i915GMm-hfs, AWARD BIOS cpu: Pentium-M 750 (0.8 to 1.86 MHz) openSuSE 10.2 with kernel 2.6.22.1 The cpu fan can not be controled by linux kernel. The BIOS will switch on the cpu fan a bit above 50 deg. Celsius. The active and passive trip points both are set to 50 deg. Celsius. Temperature of the idle cpu at 800 Mhz: 34 to 42 deg. C. The BIOS never changes the trip points. Cpufreq does work perfectly. On my Toshiba M45-S355 (Toshiba Bios, Pentium M 750 - 0.8 at 1.86GHz, Debian Etch) I see the same using Kernel 2.6.21.6 Previously there was the possibility to add something like echo 100:0:65:70:0 /proc/acpi/thermal_zone/THRM/trip_points echo 2 /proc/acpi/thermal_zone/THRM/polling_frequency echo ondemand /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor I never do that, but see below (Kernel 2.6.21.6): cat /proc/acpi/thermal_zone/TZCL/trip_points critical (S5): 105 C cat /proc/acpi/thermal_zone/TZCL/polling_frequency polling frequency: 2 seconds cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor ondemand Regards, Renato S. Yamane - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Friday 03 August 2007 08:53, Knut Petersen wrote: Len Brown : Thanks for the sighting, Knut! This regression is dramatic when put in the terms of 50% performance hit! I guess the good news is that thermal throttling is doing the job we are asking it to:-) Thermal management by cpufreq is working really fine ;-) Unfortunately, I a lot of people don't understand that the ;-) after this statement and they really think that cpufreq is a solution for thermal management. It isn't. Systems still need to be thermally sane when they are fully utilized and cpufreq helps not. My problems are definitely not related to a linux bug. All trip_points are fixed, hardcoded in the system BIOS at address 0x000FF810. Yes, I could hack and flash a custom BIOS. After reading a lot I think I even could fix the DSDT. No, you should never have to override your BIOS -- except for debugging. If Windows works out-of-the-box on this system, then Linux should too - even if we have to use a DMI-based workaround for a BIOS bug. I'm looking forward to seeing the bug report that you are going to file. Please include the dmidecode output in addition to the acpidump output. thanks, -Len - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Friday 03 August 2007 07:43, Renato S. Yamane wrote: Len Brown escreveu: On Thursday 02 August 2007 04:40, Knut Petersen wrote: mainboard: AOpen i915GMm-hfs, AWARD BIOS cpu: Pentium-M 750 (0.8 to 1.86 MHz) openSuSE 10.2 with kernel 2.6.22.1 The cpu fan can not be controled by linux kernel. The BIOS will switch on the cpu fan a bit above 50 deg. Celsius. The active and passive trip points both are set to 50 deg. Celsius. Temperature of the idle cpu at 800 Mhz: 34 to 42 deg. C. The BIOS never changes the trip points. Cpufreq does work perfectly. On my Toshiba M45-S355 (Toshiba Bios, Pentium M 750 - 0.8 at 1.86GHz, Debian Etch) I see the same using Kernel 2.6.21.6 Previously there was the possibility to add something like echo 100:0:65:70:0 /proc/acpi/thermal_zone/THRM/trip_points echo 2 /proc/acpi/thermal_zone/THRM/polling_frequency echo ondemand /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor I never do that, but see below (Kernel 2.6.21.6): cat /proc/acpi/thermal_zone/TZCL/trip_points critical (S5): 105 C cat /proc/acpi/thermal_zone/TZCL/polling_frequency polling frequency: 2 seconds cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor ondemand Renato, I don't understand how your Toshiba is similar to Knut's Aopen. You've got a single critical trip point at 105C, but no active or passive trip points. Are you reporting some kind of failure? The only thing wrong with your system is that polling_frequency != 0 -- but that is probably a distro configuration issue rather than a kernel issue. thanks, -Len - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Friday 03 August 2007 07:16, Thomas Renninger wrote: On Thu, 2007-08-02 at 20:38 +0200, Andi Kleen wrote: On Thu, Aug 02, 2007 at 03:57:54PM +, Pavel Machek wrote: On Thu 2007-08-02 15:16:22, Andi Kleen wrote: On Thu, Aug 02, 2007 at 02:04:42PM +0100, Alan Cox wrote: Set a taint flag, That's hardly any useful if the machine is dead afterwards. It won't be the hardware will do a failsafe shutdown first. Not necessarily. At SUSE we had at least one broken laptop with wrong trip points. The machine ran very hot for some time and afterwards the hard disk was dead. Yes, but it was original BIOS trip points that were wrong. And yes, its failsafe shutdown was too late. At least lowering the trip points would allow me to run it safely. I have no problem with lowering them (in fact I proposed this to Thomas as a possible solution at some point). Just rising is a bad idea. Ok. If nobody screams (especially Len who has to accept this in the end, I don't want to do work for nothing..), I'll try an implementation that: - Allows lowering trip points - If BIOS modifies trip points, the overridden ones might also get lowered if they are even lower - Allow the definition of a passive trip point (with some default values for hysteresis), even if the thermal zone does not provide one If we have something like this, we could still discuss a config option, that also allows to increase trip points, marking it with If you set this you can destroy your machine, you have been warned While this would not be an option for distributions to compile in, some people may come around the biggest hammer - overriding DSDT. I cannot promise, but I try to get this for 2.6.24. I think if you are enamored with overriding trip points at SuSE, that you should simply restore the original scheme as the value add for SuSE kernels. Seriously, I'm totally fine with that. You should be aware, however, that (one of) the fundamental flaws with that scheme, shared with what you describe above, is that the OS can not actually change the trip points in the thermal sensor. The sensor is going to trip at the temperature that _it_ thinks the trip point is at -- not the trip point that you are letting the user think it is at. Ie. what is advertised as a trip-point override actually defeats the entire concept of trip-points, and it is mandatory that you enable periodic polling of the current temperature to compare with your new thresholds to work-around that. This faking out the user, plus the fact that the BIOS does change trip-points at run-time, made the original scheme fundamentally unsound. Further, I've not yet found a single system where use of this scheme wasn't papering over some other problem. For the upstream kernel, I think it is more appropriate to expose and fix the fundamental problems. For distro kernels, I'm less concerned if you hide bugs instead of fixing them. We had quite a long discussion when I deleted the trip-point-override scheme in -mm. Then it rode through the entire 2.6.22 release cycle. However, I have yet to see a single bug report filed that has shown that Linux should be doing this, or something like it. I'm hopeful that Knut's or Adrian's will be the first -- but I'm still waiting. -Len - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thursday 02 August 2007 05:45, Adrian Schröter wrote: > On Thursday 02 August 2007 11:42:27 wrote Thomas Renninger: > > On Thu, 2007-08-02 at 10:40 +0200, Knut Petersen wrote: > > > Hi everybody! > > > > > > Kernel 2.6.22 decreases performance by about 50% on my system. > > > No, I do not like that. The reason is a broken BIOS, granted, but there > > > was a perfect workaround in the kernel that has been dropped. > > > > > > mainboard: AOpen i915GMm-hfs, AWARD BIOS > > > cpu: Pentium-M 750 (0.8 to 1.86 MHz) > > > openSuSE 10.2 with kernel 2.6.22.1 > > > > Is this a DELL laptop that gets throttled by 75% to throttling state 6 > > if 60 degrees are exceeded? > > Adrian has such a machine..., no idea what is going on with that one, > > but only workaround to get any use out of this machine is to override at > > least the passive trip point. > > JFYI, there are plenty of these systems around, it was one out of four > standard Novell modells. I am mabye just the first one who uses Factory on > it, but expect more bugreports when 10.3 gets released ... That's very good news, Adrian. In the past all we had to go on was the memory of a machine that died several years ago. But if you've got a live failure, that is really valuable. Please go here http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI and submit a new sighting vs. Power-Thermal and attach the output from acpidump, cat /proc/acpi/thermal_zone/*/* and assign it to [EMAIL PROTECTED] thanks, -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thursday 02 August 2007 04:40, Knut Petersen wrote: > Kernel 2.6.22 decreases performance by about 50% on my system. > No, I do not like that. The reason is a broken BIOS, granted, but there > was a perfect workaround in the kernel that has been dropped. > > mainboard: AOpen i915GMm-hfs, AWARD BIOS > cpu: Pentium-M 750 (0.8 to 1.86 MHz) > openSuSE 10.2 with kernel 2.6.22.1 > > The cpu fan can not be controled by linux kernel. > The BIOS will switch on the cpu fan a bit above 50 deg. Celsius. > The active and passive trip points both are set to 50 deg. Celsius. > Temperature of the idle cpu at 800 Mhz: 34 to 42 deg. C. > The BIOS never changes the trip points. > Cpufreq does work perfectly. > > Previously there was the possibility to add something like > > echo "100:0:65:70:0" > /proc/acpi/thermal_zone/THRM/trip_points > echo 2 > /proc/acpi/thermal_zone/THRM/polling_frequency > echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > > to e.g. /etc/init.d/boot.local. With 2.6.22 that solution does not exist > any longer. Now the code in thermal.c slows down the cpu under load > to prevent "overheating". Kernel compile time increases from about 12 > to 18 minutes. No, I don´t like that, nobody would. Thanks for the sighting, Knut! This regression is dramatic when put in the terms of 50% performance hit! I guess the good news is that thermal throttling is doing the job we are asking it to:-) The statement above regarding the existence of active trip points and the kernel not being able to control the fan are inconsistent with each other. Please open a sighting for this machine here: http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI vs. Power-Thermal and attach the output from acpidump, cat /proc/acpi/thermal_zone/*/* and assign it to [EMAIL PROTECTED] BTW. does the board boot and run properly with "acpi=off"? thanks, -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Knut Petersen <[EMAIL PROTECTED]> writes: > echo "I know what I am doing" > > /proc/acpi/thermal_zone/THRM/enable_really_dangerous_options There is a shorter version: $ su Password: # -- Krzysztof Halasa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 08:38:30PM +0200, Andi Kleen wrote: > On Thu, Aug 02, 2007 at 03:57:54PM +, Pavel Machek wrote: > > Yes, but it was original BIOS trip points that were wrong. And yes, > > its failsafe shutdown was too late. At least lowering the trip points > > would allow me to run it safely. > > I have no problem with lowering them (in fact I proposed this > to Thomas as a possible solution at some point). Just rising > is a bad idea. Though for this to be reliable, you need to ignore any notifications that would raise the trip points while still paying attention to any that would lower them. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 03:57:54PM +, Pavel Machek wrote: > On Thu 2007-08-02 15:16:22, Andi Kleen wrote: > > On Thu, Aug 02, 2007 at 02:04:42PM +0100, Alan Cox wrote: > > > > > Set a taint flag, > > > > That's hardly any useful if the machine is dead afterwards. > > > > > > It won't be the hardware will do a failsafe shutdown first. > > > > Not necessarily. At SUSE we had at least one broken laptop > > with wrong trip points. The machine ran very hot for some time > > and afterwards the hard disk was dead. > > Yes, but it was original BIOS trip points that were wrong. And yes, > its failsafe shutdown was too late. At least lowering the trip points > would allow me to run it safely. I have no problem with lowering them (in fact I proposed this to Thomas as a possible solution at some point). Just rising is a bad idea. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Hi! > Well, it would not be the first time to eliminate a regression by > reverting a > patch after it was accepted previously. > >> Sanity checks that trip points only can get lowered (compared to initial > >> provided ones) needs to be added. > >> Len, Rui: For short-term can some > But I _need_ to raise the unreasonably low passive trip point. We could > decide to > protect the innocent user by allowing write access to trip_points only > after a previous Actually, you should lower your active trip point, and keep cpu temp below 50C. > echo "I know what I am doing" > > /proc/acpi/thermal_zone/THRM/enable_really_dangerous_options No... but patch that only permits lowering could be acceptable. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu 2007-08-02 15:16:22, Andi Kleen wrote: > On Thu, Aug 02, 2007 at 02:04:42PM +0100, Alan Cox wrote: > > > > Set a taint flag, > > > That's hardly any useful if the machine is dead afterwards. > > > > It won't be the hardware will do a failsafe shutdown first. > > Not necessarily. At SUSE we had at least one broken laptop > with wrong trip points. The machine ran very hot for some time > and afterwards the hard disk was dead. Yes, but it was original BIOS trip points that were wrong. And yes, its failsafe shutdown was too late. At least lowering the trip points would allow me to run it safely. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Hi! > > I didn't understand the arguments either, actually. > > The issue is that you can actually kill hardware by setting this wrong. > We've had such cases where trip point problems eventually lead > to overheated laptops with hard disks dying etc. Actually, that was my machine. Omnibook xe3; BIOS provided trip points *did* kill the disk. At least I was able to work around it with writing to trip points. Yes, ACPI mandates emergency shutdown when critical+delta point is reached, *in hardware*. So this only endangers very broken machines, and it also fixes lot of them. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 02:04:42PM +0100, Alan Cox wrote: > > > Set a taint flag, > > That's hardly any useful if the machine is dead afterwards. > > It won't be the hardware will do a failsafe shutdown first. Not necessarily. At SUSE we had at least one broken laptop with wrong trip points. The machine ran very hot for some time and afterwards the hard disk was dead. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
> > Andi, would the above be mechanism sufficiently safe for your taste? > > No. I don't beleve Andi's taste (or lack thereof) is relevant to this discussion. He's not for example explained why its better to force people to disable all the APCI power and thermal control on their system rather than adjust trip points. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
> > Set a taint flag, > That's hardly any useful if the machine is dead afterwards. It won't be the hardware will do a failsafe shutdown first. > You'll just end up with "Linux destroyed my laptop" headlines all > over the internet and rightfully very annoyed users. You have to systematically sit down and tweak your machine. > The philosophy didn't include physically destroying hardware > as far as I know. It most certainly did. With safety checks you could override. > > As root you can erase the bios, > We don't ship the devbios driver for good reasons. Thats debatably a bad reason (the user space API is wrong thats all), and one thats totally inconsistent with some of the other drivers we do ship. > > lock the hard disk with a random > > password, reflash your video card > > That all requires significant effort and custom software. It's not that we > have a one liner echo destroy > /sys/.../flash-bios. Well you can do the hard disk one in one line of perl, the video card one in a small bit of C. And this merely makes the argument that raising the trip points should be harder. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 02:42:19PM +0200, Thomas Renninger wrote: > On Thu, 2007-08-02 at 12:56 +0100, Matthew Garrett wrote: > > The policy has been to attempt to be bug-compatible with Windows > > whenever possible for some time now. > *whenever possible* But there's no evidence whatsoever that this is something we can't handle... > > No, that's not the only reason for notifications. Alteration in hardware > > state may also force a recalculation of trip point (adding a battery to > > a bay rather than a DVD drive may require the platform to be kept at a > > lower temperature) > "I've seen no evidence that this happens...", but I see the point. It's explicitly mentioned as one of the use cases for trip point alteration in the spec. > > Surely people want this functionality so that they can raise trip > > points? > For Adrian it would be enough to be able to lower them. Which suggests that we're probably doing something wrong at some more fundamental level... > Also being able to define a passive trip point (even if not provided by > BIOS) could help a lot machines. I agree that being able to lower trip points is unlikely to result in hardware damage, but still think that it's likely to be papering over genuine bugs that we could fix properly. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 02:35:18PM +0200, Thomas Renninger wrote: > On Thu, 2007-08-02 at 13:15 +0100, Matthew Garrett wrote: > > That machine has no active thermal trip points, so I'm not sure how it's > > relevant here. > >From above: "Windows as I understand it has vendor mechanisms to..." > Maybe thermal trip points are not influenced here, it's at least about > thermal management and another prove that we cannot just try to copy > Windows behavior, but need to provide workarounds wherever possible. There's absolutely no evidence in the bug log there that the user's problems are in any way due to Windows-specific code. The SetSilentMode stuff is an additional item of functionality that underclocks various bits of hardware, not one that's actually required for the platform to function correctly. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 2007-08-02 at 12:56 +0100, Matthew Garrett wrote: > On Thu, Aug 02, 2007 at 01:45:00PM +0200, Thomas Renninger wrote: > > On Thu, 2007-08-02 at 12:13 +0100, Matthew Garrett wrote: > > > I strongly suspect that the vast majority[1] of hardware that "needs" > > > the trip points changing works perfectly well under Windows, so it's > > > likely to be papering over bugs in the kernel. It'd be nice if we fixed > > > those rather than encouraging people to poke stuff into /proc, > > Some arguments against that: > > - You cannot tell a customer: Wait for the kernel in half a year. > > This is the time it at least needs until a laptop got sold, the > > problem is found, a patch is written and checked in and finally > > hits the distribution. > > We have to do so frequently. New hardware often exposes bugs in the > kernel. And often we can provide a boot param or whatever, that makes it at least useable. > > > - You can also not backport fixes as ACPI patches mostly have the > > potential to break other machines/BIOSes > > - There also exist the policy to not fix up/workaround totally broken > > AML BIOS implementations > > The policy has been to attempt to be bug-compatible with Windows > whenever possible for some time now. *whenever possible* > > > - We do not need to and never will be able to copy or do the same > > Windows is doing > > Given that many vendors still only test against Windows, that's exactly > what we need to do. But we cannot (copy all windows (mis-)behavior). > > > > especially when doing so is guaranteed to break in really confusing ways > > > with a lot of hardware. The firmware can reset the trip points at > > > essentially arbitrary times and is well within its rights to expect the > > > OS to actually pay attention to them. > > What the hell is so wrong with: > > > > Let the user override the trip points. If he does so, ignore > > thermal trip point updates from BIOS. Don't care for hysteresis > > BIOS implementations (these are the BIOS trip point updates). > > No, that's not the only reason for notifications. Alteration in hardware > state may also force a recalculation of trip point (adding a battery to > a bay rather than a DVD drive may require the platform to be kept at a > lower temperature) "I've seen no evidence that this happens...", but I see the point. > > If user changes them, it's his fault, he doesn't need to... > > Make sure that trip points can only be lowered, compared to the > > initially fetched one from BIOS. > > Surely people want this functionality so that they can raise trip > points? For Adrian it would be enough to be able to lower them. Also being able to define a passive trip point (even if not provided by BIOS) could help a lot machines. What about at least: - Be able to override passive cooling trip point - If BIOS does not provide one, let user be able to define it This should already make a lot people happy. Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 2007-08-02 at 13:15 +0100, Matthew Garrett wrote: > On Thu, Aug 02, 2007 at 02:06:26PM +0200, Thomas Renninger wrote: > > On Thu, 2007-08-02 at 12:57 +0100, Matthew Garrett wrote: > > > On Thu, Aug 02, 2007 at 12:59:47PM +0100, Alan Cox wrote: > > > > Windows as I understand it has vendor mechanisms to allow the bits > > > > shipped with the OS to override/ignore just about everything trip points > > > > included. Lots of hardware that requires fixups in Linux and just works > > > > in Windows is not Linux bugs but Windows magic .inf files and other > > > > registry gunge done by the machine vendor. We see this in ATA, in power > > > > management and elsewhere. > > > > > > I've seen no evidence that this happens with thermal trip points. > > > > WMI needed for fan control -- FSC Amilo M3438G > > http://bugzilla.kernel.org/show_bug.cgi?id=5670 > > That machine has no active thermal trip points, so I'm not sure how it's > relevant here. >From above: "Windows as I understand it has vendor mechanisms to..." Maybe thermal trip points are not influenced here, it's at least about thermal management and another prove that we cannot just try to copy Windows behavior, but need to provide workarounds wherever possible. Thomas > By the sounds of the bug log, I suspect Linux just runs > slightly hotter on the machine than Windows does - especially since the > user isn't running the closed nvidia driver, so there's nothing to carry > out any power management on the GPU. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 02:06:26PM +0200, Thomas Renninger wrote: > On Thu, 2007-08-02 at 12:57 +0100, Matthew Garrett wrote: > > On Thu, Aug 02, 2007 at 12:59:47PM +0100, Alan Cox wrote: > > > Windows as I understand it has vendor mechanisms to allow the bits > > > shipped with the OS to override/ignore just about everything trip points > > > included. Lots of hardware that requires fixups in Linux and just works > > > in Windows is not Linux bugs but Windows magic .inf files and other > > > registry gunge done by the machine vendor. We see this in ATA, in power > > > management and elsewhere. > > > > I've seen no evidence that this happens with thermal trip points. > > WMI needed for fan control -- FSC Amilo M3438G > http://bugzilla.kernel.org/show_bug.cgi?id=5670 That machine has no active thermal trip points, so I'm not sure how it's relevant here. By the sounds of the bug log, I suspect Linux just runs slightly hotter on the machine than Windows does - especially since the user isn't running the closed nvidia driver, so there's nothing to carry out any power management on the GPU. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
> Andi Kleen wrote: > > > I don't think it's that unreasonable to require source code > modifications > > for anything that can kill hardware. At least that raises the barrier > > a bit and hopefully ensures people think twice about it and then really > > only blame themselves if anything goes wrong. > > Andi, would the above be mechanism sufficiently safe for your taste? No. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 2007-08-02 at 12:57 +0100, Matthew Garrett wrote: > On Thu, Aug 02, 2007 at 12:59:47PM +0100, Alan Cox wrote: > > > I strongly suspect that the vast majority[1] of hardware that "needs" > > > the trip points changing works perfectly well under Windows, so it's > > > > Windows as I understand it has vendor mechanisms to allow the bits > > shipped with the OS to override/ignore just about everything trip points > > included. Lots of hardware that requires fixups in Linux and just works > > in Windows is not Linux bugs but Windows magic .inf files and other > > registry gunge done by the machine vendor. We see this in ATA, in power > > management and elsewhere. > > I've seen no evidence that this happens with thermal trip points. WMI needed for fan control -- FSC Amilo M3438G http://bugzilla.kernel.org/show_bug.cgi?id=5670 Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
> Set a taint flag, That's hardly any useful if the machine is dead afterwards. > print a loud message Neither. You'll just end up with "Linux destroyed my laptop" headlines all over the internet and rightfully very annoyed users. > Or have you forgotten the original Unix > philosophy too ? The philosophy didn't include physically destroying hardware as far as I know. > > > Here we had obviously-useful-to-you functionality which was taken away > > > without, afaik, providing any alternative. > > > > I don't think it's that unreasonable to require source code modifications > > for anything that can kill hardware. > > As root you can erase the bios, We don't ship the devbios driver for good reasons. > lock the hard disk with a random > password, reflash your video card That all requires significant effort and custom software. It's not that we have a one liner echo destroy > /sys/.../flash-bios. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 12:59:47PM +0100, Alan Cox wrote: > > I strongly suspect that the vast majority[1] of hardware that "needs" > > the trip points changing works perfectly well under Windows, so it's > > Windows as I understand it has vendor mechanisms to allow the bits > shipped with the OS to override/ignore just about everything trip points > included. Lots of hardware that requires fixups in Linux and just works > in Windows is not Linux bugs but Windows magic .inf files and other > registry gunge done by the machine vendor. We see this in ATA, in power > management and elsewhere. I've seen no evidence that this happens with thermal trip points. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 01:45:00PM +0200, Thomas Renninger wrote: > On Thu, 2007-08-02 at 12:13 +0100, Matthew Garrett wrote: > > I strongly suspect that the vast majority[1] of hardware that "needs" > > the trip points changing works perfectly well under Windows, so it's > > likely to be papering over bugs in the kernel. It'd be nice if we fixed > > those rather than encouraging people to poke stuff into /proc, > Some arguments against that: > - You cannot tell a customer: Wait for the kernel in half a year. > This is the time it at least needs until a laptop got sold, the > problem is found, a patch is written and checked in and finally > hits the distribution. We have to do so frequently. New hardware often exposes bugs in the kernel. > - You can also not backport fixes as ACPI patches mostly have the > potential to break other machines/BIOSes > - There also exist the policy to not fix up/workaround totally broken > AML BIOS implementations The policy has been to attempt to be bug-compatible with Windows whenever possible for some time now. > - We do not need to and never will be able to copy or do the same > Windows is doing Given that many vendors still only test against Windows, that's exactly what we need to do. > > especially when doing so is guaranteed to break in really confusing ways > > with a lot of hardware. The firmware can reset the trip points at > > essentially arbitrary times and is well within its rights to expect the > > OS to actually pay attention to them. > What the hell is so wrong with: > > Let the user override the trip points. If he does so, ignore > thermal trip point updates from BIOS. Don't care for hysteresis > BIOS implementations (these are the BIOS trip point updates). No, that's not the only reason for notifications. Alteration in hardware state may also force a recalculation of trip point (adding a battery to a bay rather than a DVD drive may require the platform to be kept at a lower temperature) > If user changes them, it's his fault, he doesn't need to... > Make sure that trip points can only be lowered, compared to the > initially fetched one from BIOS. Surely people want this functionality so that they can raise trip points? -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
> I strongly suspect that the vast majority[1] of hardware that "needs" > the trip points changing works perfectly well under Windows, so it's Windows as I understand it has vendor mechanisms to allow the bits shipped with the OS to override/ignore just about everything trip points included. Lots of hardware that requires fixups in Linux and just works in Windows is not Linux bugs but Windows magic .inf files and other registry gunge done by the machine vendor. We see this in ATA, in power management and elsewhere. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 2007-08-02 at 12:13 +0100, Matthew Garrett wrote: > On Thu, Aug 02, 2007 at 12:02:21PM +0100, Alan Cox wrote: > > > Anyway, only solution/workaround to use these machines with current > > > kernels is to override trip points, maybe the patch should really just > > > be reverted... > > > > The question really is whether the vendors will all revert it and carry > > it as a patch or whether the main tree will accept reality on this one. > > > > Reverting it and adding a taint marker if you do it is much preferable I > > suspect to having every vendor revert this bogus if well meaning > > changeset. > > I strongly suspect that the vast majority[1] of hardware that "needs" > the trip points changing works perfectly well under Windows, so it's > likely to be papering over bugs in the kernel. It'd be nice if we fixed > those rather than encouraging people to poke stuff into /proc, Some arguments against that: - You cannot tell a customer: Wait for the kernel in half a year. This is the time it at least needs until a laptop got sold, the problem is found, a patch is written and checked in and finally hits the distribution. - You can also not backport fixes as ACPI patches mostly have the potential to break other machines/BIOSes - There also exist the policy to not fix up/workaround totally broken AML BIOS implementations - We do not need to and never will be able to copy or do the same Windows is doing - ... > especially when doing so is guaranteed to break in really confusing ways > with a lot of hardware. The firmware can reset the trip points at > essentially arbitrary times and is well within its rights to expect the > OS to actually pay attention to them. What the hell is so wrong with: Let the user override the trip points. If he does so, ignore thermal trip point updates from BIOS. Don't care for hysteresis BIOS implementations (these are the BIOS trip point updates). If user changes them, it's his fault, he doesn't need to... Make sure that trip points can only be lowered, compared to the initially fetched one from BIOS. This is neither confusing, nor dangerous in any way (beside the fact that the critical trip point might get dynamically lowered by BIOS, which is totally insane). Thomas > > [1] Some hardware is simply broken. We don't carry phc just because some > vendors put the wrong voltage values in their tables, either - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Thomas Renninger wrote: >> mainboard: AOpen i915GMm-hfs, AWARD BIOS >> cpu: Pentium-M 750 (0.8 to 1.86 MHz) >> openSuSE 10.2 with kernel 2.6.22.1 > Is this a DELL laptop that gets throttled by 75% to throttling state 6 > if 60 degrees are exceeded? No, it is a Pentium M desktop board.: Chipset i915GM, FSB 533MHz, max 2GB DDR2 RAM, 2 PCI and 1 16x PCI Express slots, serial, parallel, usb, firewire, 2x Marvel Gigabit Ethernet, Realtek ALC 880 sound, IDE, Intel SATA and SiI SATA Raid, FDC, DVI and VGA video out etc. Very low power consumption: ~40W to 65W for the whole system, except monitor. > As 2.6.22 was shipped without, I think reverting is not a real option. Well, it would not be the first time to eliminate a regression by reverting a patch after it was accepted previously. >> Sanity checks that trip points only can get lowered (compared to initial >> provided ones) needs to be added. >> Len, Rui: For short-term can some But I _need_ to raise the unreasonably low passive trip point. We could decide to protect the innocent user by allowing write access to trip_points only after a previous echo "I know what I am doing" > /proc/acpi/thermal_zone/THRM/enable_really_dangerous_options if we believe that this is a good idea ... Andi Kleen wrote: > I don't think it's that unreasonable to require source code modifications > for anything that can kill hardware. At least that raises the barrier > a bit and hopefully ensures people think twice about it and then really > only blame themselves if anything goes wrong. Andi, would the above be mechanism sufficiently safe for your taste? cu, Knut - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 12:02:21PM +0100, Alan Cox wrote: > > Anyway, only solution/workaround to use these machines with current > > kernels is to override trip points, maybe the patch should really just > > be reverted... > > The question really is whether the vendors will all revert it and carry > it as a patch or whether the main tree will accept reality on this one. > > Reverting it and adding a taint marker if you do it is much preferable I > suspect to having every vendor revert this bogus if well meaning > changeset. I strongly suspect that the vast majority[1] of hardware that "needs" the trip points changing works perfectly well under Windows, so it's likely to be papering over bugs in the kernel. It'd be nice if we fixed those rather than encouraging people to poke stuff into /proc, especially when doing so is guaranteed to break in really confusing ways with a lot of hardware. The firmware can reset the trip points at essentially arbitrary times and is well within its rights to expect the OS to actually pay attention to them. [1] Some hardware is simply broken. We don't carry phc just because some vendors put the wrong voltage values in their tables, either -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
> Anyway, only solution/workaround to use these machines with current > kernels is to override trip points, maybe the patch should really just > be reverted... The question really is whether the vendors will all revert it and carry it as a patch or whether the main tree will accept reality on this one. Reverting it and adding a taint marker if you do it is much preferable I suspect to having every vendor revert this bogus if well meaning changeset. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
> Also it runs the system out of spec and is similar to overclocking > which we also do not support. We do not systematically prevent overclocking. There are lots of cases where altering the trip points is helpful, and if you look in vendor bugzilla databases there are multiple moans from people whose laptops now run slow, or in many cases are simply unusable as a result of Len's change. Given you can achieve some of the same result by not loading the relevant ACPI code in the first place your argument makes no rational sense at all. Set a taint flag, print a loud message but don't stop users actually doing things they intend as root. Or have you forgotten the original Unix philosophy too ? > > Here we had obviously-useful-to-you functionality which was taken away > > without, afaik, providing any alternative. > > I don't think it's that unreasonable to require source code modifications > for anything that can kill hardware. As root you can erase the bios, lock the hard disk with a random password, reflash your video card Sorry Andi, you simply do not know better than all end users. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 2007-08-02 at 11:45 +0200, Adrian Schröter wrote: > On Thursday 02 August 2007 11:42:27 wrote Thomas Renninger: > > On Thu, 2007-08-02 at 10:40 +0200, Knut Petersen wrote: > > > Hi everybody! > > > > > > Kernel 2.6.22 decreases performance by about 50% on my system. > > > No, I do not like that. The reason is a broken BIOS, granted, but there > > > was a perfect workaround in the kernel that has been dropped. > > > > > > mainboard: AOpen i915GMm-hfs, AWARD BIOS > > > cpu: Pentium-M 750 (0.8 to 1.86 MHz) > > > openSuSE 10.2 with kernel 2.6.22.1 > > > > Is this a DELL laptop that gets throttled by 75% to throttling state 6 > > if 60 degrees are exceeded? > > Adrian has such a machine..., no idea what is going on with that one, > > but only workaround to get any use out of this machine is to override at > > least the passive trip point. > > JFYI, there are plenty of these systems around, it was one out of four > standard Novell modells. I am mabye just the first one who uses Factory on > it, but expect more bugreports when 10.3 gets released ... Oops. So this is not broken HW/BIOS, but definitely a kernel problem? Only idea that comes to my mind finding this is to grep through the DSDT and look out for code that accesses CPU throttling HW ports. Maybe ACPI subsystem gets something wrong, processing this code and activating throttling by accident? Anyway, only solution/workaround to use these machines with current kernels is to override trip points, maybe the patch should really just be reverted... Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Andrew Morton <[EMAIL PROTECTED]> writes: > I didn't understand the arguments either, actually. The issue is that you can actually kill hardware by setting this wrong. We've had such cases where trip point problems eventually lead to overheated laptops with hard disks dying etc. Also it runs the system out of spec and is similar to overclocking which we also do not support. > Here we had obviously-useful-to-you functionality which was taken away > without, afaik, providing any alternative. I don't think it's that unreasonable to require source code modifications for anything that can kill hardware. At least that raises the barrier a bit and hopefully ensures people think twice about it and then really only blame themselves if anything goes wrong. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thursday 02 August 2007 11:42:27 wrote Thomas Renninger: > On Thu, 2007-08-02 at 10:40 +0200, Knut Petersen wrote: > > Hi everybody! > > > > Kernel 2.6.22 decreases performance by about 50% on my system. > > No, I do not like that. The reason is a broken BIOS, granted, but there > > was a perfect workaround in the kernel that has been dropped. > > > > mainboard: AOpen i915GMm-hfs, AWARD BIOS > > cpu: Pentium-M 750 (0.8 to 1.86 MHz) > > openSuSE 10.2 with kernel 2.6.22.1 > > Is this a DELL laptop that gets throttled by 75% to throttling state 6 > if 60 degrees are exceeded? > Adrian has such a machine..., no idea what is going on with that one, > but only workaround to get any use out of this machine is to override at > least the passive trip point. JFYI, there are plenty of these systems around, it was one out of four standard Novell modells. I am mabye just the first one who uses Factory on it, but expect more bugreports when 10.3 gets released ... bye adrian -- Adrian Schroeter SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) email: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 2007-08-02 at 10:40 +0200, Knut Petersen wrote: > Hi everybody! > > Kernel 2.6.22 decreases performance by about 50% on my system. > No, I do not like that. The reason is a broken BIOS, granted, but there > was a perfect workaround in the kernel that has been dropped. > > mainboard: AOpen i915GMm-hfs, AWARD BIOS > cpu: Pentium-M 750 (0.8 to 1.86 MHz) > openSuSE 10.2 with kernel 2.6.22.1 Is this a DELL laptop that gets throttled by 75% to throttling state 6 if 60 degrees are exceeded? Adrian has such a machine..., no idea what is going on with that one, but only workaround to get any use out of this machine is to override at least the passive trip point. > > The cpu fan can not be controled by linux kernel. > The BIOS will switch on the cpu fan a bit above 50 deg. Celsius. > The active and passive trip points both are set to 50 deg. Celsius. > Temperature of the idle cpu at 800 Mhz: 34 to 42 deg. C. > The BIOS never changes the trip points. > Cpufreq does work perfectly. > > Previously there was the possibility to add something like > > echo "100:0:65:70:0" > /proc/acpi/thermal_zone/THRM/trip_points > echo 2 > /proc/acpi/thermal_zone/THRM/polling_frequency > echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > > to e.g. /etc/init.d/boot.local. With 2.6.22 that solution does not exist > any longer. Now the code in thermal.c slows down the cpu under load > to prevent "overheating". Kernel compile time increases from about 12 > to 18 minutes. No, I don´t like that, nobody would. > > Possible solutions: > > 1. Get a better BIOS! --- There is none. > > 2. Fix DSDT! --- Recompiling gives a number of errors ... I do not know > how to fix it. > > 3. Don´t include thermal.c! --- That does help, but as this is a 24/7 > system, the > cpu fan could break. At that time I do not want to rely on the BIOS to > save my > system (the next trip point is at 100 deg. Celsius). > > 4. Revert Len Browns commit 11ccc0f249cb01a129f54760b8ff087f242935d4 > > I would vote for option 4, but I do understand some of the arguments of > Len in > the 2.6.22-rc1-mm1 discussion in May. Yes, communicating trip points to > thermal.c is a hack, it will fail on systems that change trip points > dynamically > and it might be dangerous for the machine if unreasonable trip points > are chosen. > But it does help to keep the machine quiet, and to work around a too low > or too > trip points defined by the BIOS. > > If it should be not acceptable to revert the questionable commit without > changes, As 2.6.22 was shipped without, I think reverting is not a real option. > would it be acceptable to make rw trip_points a kernel config option? IMO something new should be added. On longterm, maybe it's possible to marriage ACPI thermal control with hwmon interface, AFAIK there are already efforts to do so, but I don't know much about it. Still overriding trip points is a problem because BIOS can change them at runtime... IMO it should just be possible and machines changing them at runtime either: - do change the user's overrides - or trip points are simply fixed after user has overridden them -> my favorite (Don't care for hysteresis BIOS implementations, if user changes them, it's his fault, he doesn't need to...) Sanity checks that trip points only can get lowered (compared to initial provided ones) needs to be added. Len, Rui: For short-term can something like that be added at least to the new sysfs interface (I am willing to help if this is a "would be nice to have, but no time, maybe later" issue)? Especially passive trip point modification is IMO a powerful feature. You can easily build a passive cooled system, running at the performance level your cooling system allows (CPU frequency simply gets lowered before fans kick in). Other architectures than ACPI powered already make use of CPU frequency scaling. An ACPI independent passive cooling implementation connecting thermal control (hwmon?) and cpufreq interface should be desired for future? (could get tricky because ACPI spec has some special needs for passive cooling) Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 02 Aug 2007 10:40:44 +0200 Knut Petersen <[EMAIL PROTECTED]> wrote: > Hi everybody! > > Kernel 2.6.22 decreases performance by about 50% on my system. > No, I do not like that. The reason is a broken BIOS, granted, but there > was a perfect workaround in the kernel that has been dropped. > > mainboard: AOpen i915GMm-hfs, AWARD BIOS > cpu: Pentium-M 750 (0.8 to 1.86 MHz) > openSuSE 10.2 with kernel 2.6.22.1 > > The cpu fan can not be controled by linux kernel. > The BIOS will switch on the cpu fan a bit above 50 deg. Celsius. > The active and passive trip points both are set to 50 deg. Celsius. > Temperature of the idle cpu at 800 Mhz: 34 to 42 deg. C. > The BIOS never changes the trip points. > Cpufreq does work perfectly. > > Previously there was the possibility to add something like > > echo "100:0:65:70:0" > /proc/acpi/thermal_zone/THRM/trip_points > echo 2 > /proc/acpi/thermal_zone/THRM/polling_frequency > echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > > to e.g. /etc/init.d/boot.local. With 2.6.22 that solution does not exist > any longer. Now the code in thermal.c slows down the cpu under load > to prevent "overheating". Kernel compile time increases from about 12 > to 18 minutes. No, I don´t like that, nobody would. > > Possible solutions: > > 1. Get a better BIOS! --- There is none. > > 2. Fix DSDT! --- Recompiling gives a number of errors ... I do not know > how to fix it. > > 3. Don´t include thermal.c! --- That does help, but as this is a 24/7 > system, the > cpu fan could break. At that time I do not want to rely on the BIOS to > save my > system (the next trip point is at 100 deg. Celsius). > > 4. Revert Len Browns commit 11ccc0f249cb01a129f54760b8ff087f242935d4 > > I would vote for option 4, but I do understand some of the arguments of > Len in > the 2.6.22-rc1-mm1 discussion in May. Yes, communicating trip points to > thermal.c is a hack, it will fail on systems that change trip points > dynamically > and it might be dangerous for the machine if unreasonable trip points > are chosen. > But it does help to keep the machine quiet, and to work around a too low > or too > trip points defined by the BIOS. I didn't understand the arguments either, actually. Here we had obviously-useful-to-you functionality which was taken away without, afaik, providing any alternative. > If it should be not acceptable to revert the questionable commit without > changes, > would it be acceptable to make rw trip_points a kernel config option? Well we obviously need to do _something_. And reverting that commit until we get a decent replacement in place sounds like a fine idea to me. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 02 Aug 2007 10:40:44 +0200 Knut Petersen [EMAIL PROTECTED] wrote: Hi everybody! Kernel 2.6.22 decreases performance by about 50% on my system. No, I do not like that. The reason is a broken BIOS, granted, but there was a perfect workaround in the kernel that has been dropped. mainboard: AOpen i915GMm-hfs, AWARD BIOS cpu: Pentium-M 750 (0.8 to 1.86 MHz) openSuSE 10.2 with kernel 2.6.22.1 The cpu fan can not be controled by linux kernel. The BIOS will switch on the cpu fan a bit above 50 deg. Celsius. The active and passive trip points both are set to 50 deg. Celsius. Temperature of the idle cpu at 800 Mhz: 34 to 42 deg. C. The BIOS never changes the trip points. Cpufreq does work perfectly. Previously there was the possibility to add something like echo 100:0:65:70:0 /proc/acpi/thermal_zone/THRM/trip_points echo 2 /proc/acpi/thermal_zone/THRM/polling_frequency echo ondemand /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor to e.g. /etc/init.d/boot.local. With 2.6.22 that solution does not exist any longer. Now the code in thermal.c slows down the cpu under load to prevent overheating. Kernel compile time increases from about 12 to 18 minutes. No, I don´t like that, nobody would. Possible solutions: 1. Get a better BIOS! --- There is none. 2. Fix DSDT! --- Recompiling gives a number of errors ... I do not know how to fix it. 3. Don´t include thermal.c! --- That does help, but as this is a 24/7 system, the cpu fan could break. At that time I do not want to rely on the BIOS to save my system (the next trip point is at 100 deg. Celsius). 4. Revert Len Browns commit 11ccc0f249cb01a129f54760b8ff087f242935d4 I would vote for option 4, but I do understand some of the arguments of Len in the 2.6.22-rc1-mm1 discussion in May. Yes, communicating trip points to thermal.c is a hack, it will fail on systems that change trip points dynamically and it might be dangerous for the machine if unreasonable trip points are chosen. But it does help to keep the machine quiet, and to work around a too low or too trip points defined by the BIOS. I didn't understand the arguments either, actually. Here we had obviously-useful-to-you functionality which was taken away without, afaik, providing any alternative. If it should be not acceptable to revert the questionable commit without changes, would it be acceptable to make rw trip_points a kernel config option? Well we obviously need to do _something_. And reverting that commit until we get a decent replacement in place sounds like a fine idea to me. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 2007-08-02 at 11:45 +0200, Adrian Schröter wrote: On Thursday 02 August 2007 11:42:27 wrote Thomas Renninger: On Thu, 2007-08-02 at 10:40 +0200, Knut Petersen wrote: Hi everybody! Kernel 2.6.22 decreases performance by about 50% on my system. No, I do not like that. The reason is a broken BIOS, granted, but there was a perfect workaround in the kernel that has been dropped. mainboard: AOpen i915GMm-hfs, AWARD BIOS cpu: Pentium-M 750 (0.8 to 1.86 MHz) openSuSE 10.2 with kernel 2.6.22.1 Is this a DELL laptop that gets throttled by 75% to throttling state 6 if 60 degrees are exceeded? Adrian has such a machine..., no idea what is going on with that one, but only workaround to get any use out of this machine is to override at least the passive trip point. JFYI, there are plenty of these systems around, it was one out of four standard Novell modells. I am mabye just the first one who uses Factory on it, but expect more bugreports when 10.3 gets released ... Oops. So this is not broken HW/BIOS, but definitely a kernel problem? Only idea that comes to my mind finding this is to grep through the DSDT and look out for code that accesses CPU throttling HW ports. Maybe ACPI subsystem gets something wrong, processing this code and activating throttling by accident? Anyway, only solution/workaround to use these machines with current kernels is to override trip points, maybe the patch should really just be reverted... Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 2007-08-02 at 10:40 +0200, Knut Petersen wrote: Hi everybody! Kernel 2.6.22 decreases performance by about 50% on my system. No, I do not like that. The reason is a broken BIOS, granted, but there was a perfect workaround in the kernel that has been dropped. mainboard: AOpen i915GMm-hfs, AWARD BIOS cpu: Pentium-M 750 (0.8 to 1.86 MHz) openSuSE 10.2 with kernel 2.6.22.1 Is this a DELL laptop that gets throttled by 75% to throttling state 6 if 60 degrees are exceeded? Adrian has such a machine..., no idea what is going on with that one, but only workaround to get any use out of this machine is to override at least the passive trip point. The cpu fan can not be controled by linux kernel. The BIOS will switch on the cpu fan a bit above 50 deg. Celsius. The active and passive trip points both are set to 50 deg. Celsius. Temperature of the idle cpu at 800 Mhz: 34 to 42 deg. C. The BIOS never changes the trip points. Cpufreq does work perfectly. Previously there was the possibility to add something like echo 100:0:65:70:0 /proc/acpi/thermal_zone/THRM/trip_points echo 2 /proc/acpi/thermal_zone/THRM/polling_frequency echo ondemand /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor to e.g. /etc/init.d/boot.local. With 2.6.22 that solution does not exist any longer. Now the code in thermal.c slows down the cpu under load to prevent overheating. Kernel compile time increases from about 12 to 18 minutes. No, I don´t like that, nobody would. Possible solutions: 1. Get a better BIOS! --- There is none. 2. Fix DSDT! --- Recompiling gives a number of errors ... I do not know how to fix it. 3. Don´t include thermal.c! --- That does help, but as this is a 24/7 system, the cpu fan could break. At that time I do not want to rely on the BIOS to save my system (the next trip point is at 100 deg. Celsius). 4. Revert Len Browns commit 11ccc0f249cb01a129f54760b8ff087f242935d4 I would vote for option 4, but I do understand some of the arguments of Len in the 2.6.22-rc1-mm1 discussion in May. Yes, communicating trip points to thermal.c is a hack, it will fail on systems that change trip points dynamically and it might be dangerous for the machine if unreasonable trip points are chosen. But it does help to keep the machine quiet, and to work around a too low or too trip points defined by the BIOS. If it should be not acceptable to revert the questionable commit without changes, As 2.6.22 was shipped without, I think reverting is not a real option. would it be acceptable to make rw trip_points a kernel config option? IMO something new should be added. On longterm, maybe it's possible to marriage ACPI thermal control with hwmon interface, AFAIK there are already efforts to do so, but I don't know much about it. Still overriding trip points is a problem because BIOS can change them at runtime... IMO it should just be possible and machines changing them at runtime either: - do change the user's overrides - or trip points are simply fixed after user has overridden them - my favorite (Don't care for hysteresis BIOS implementations, if user changes them, it's his fault, he doesn't need to...) Sanity checks that trip points only can get lowered (compared to initial provided ones) needs to be added. Len, Rui: For short-term can something like that be added at least to the new sysfs interface (I am willing to help if this is a would be nice to have, but no time, maybe later issue)? Especially passive trip point modification is IMO a powerful feature. You can easily build a passive cooled system, running at the performance level your cooling system allows (CPU frequency simply gets lowered before fans kick in). Other architectures than ACPI powered already make use of CPU frequency scaling. An ACPI independent passive cooling implementation connecting thermal control (hwmon?) and cpufreq interface should be desired for future? (could get tricky because ACPI spec has some special needs for passive cooling) Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thursday 02 August 2007 11:42:27 wrote Thomas Renninger: On Thu, 2007-08-02 at 10:40 +0200, Knut Petersen wrote: Hi everybody! Kernel 2.6.22 decreases performance by about 50% on my system. No, I do not like that. The reason is a broken BIOS, granted, but there was a perfect workaround in the kernel that has been dropped. mainboard: AOpen i915GMm-hfs, AWARD BIOS cpu: Pentium-M 750 (0.8 to 1.86 MHz) openSuSE 10.2 with kernel 2.6.22.1 Is this a DELL laptop that gets throttled by 75% to throttling state 6 if 60 degrees are exceeded? Adrian has such a machine..., no idea what is going on with that one, but only workaround to get any use out of this machine is to override at least the passive trip point. JFYI, there are plenty of these systems around, it was one out of four standard Novell modells. I am mabye just the first one who uses Factory on it, but expect more bugreports when 10.3 gets released ... bye adrian -- Adrian Schroeter SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) email: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Andrew Morton [EMAIL PROTECTED] writes: I didn't understand the arguments either, actually. The issue is that you can actually kill hardware by setting this wrong. We've had such cases where trip point problems eventually lead to overheated laptops with hard disks dying etc. Also it runs the system out of spec and is similar to overclocking which we also do not support. Here we had obviously-useful-to-you functionality which was taken away without, afaik, providing any alternative. I don't think it's that unreasonable to require source code modifications for anything that can kill hardware. At least that raises the barrier a bit and hopefully ensures people think twice about it and then really only blame themselves if anything goes wrong. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Also it runs the system out of spec and is similar to overclocking which we also do not support. We do not systematically prevent overclocking. There are lots of cases where altering the trip points is helpful, and if you look in vendor bugzilla databases there are multiple moans from people whose laptops now run slow, or in many cases are simply unusable as a result of Len's change. Given you can achieve some of the same result by not loading the relevant ACPI code in the first place your argument makes no rational sense at all. Set a taint flag, print a loud message but don't stop users actually doing things they intend as root. Or have you forgotten the original Unix philosophy too ? Here we had obviously-useful-to-you functionality which was taken away without, afaik, providing any alternative. I don't think it's that unreasonable to require source code modifications for anything that can kill hardware. As root you can erase the bios, lock the hard disk with a random password, reflash your video card Sorry Andi, you simply do not know better than all end users. Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Anyway, only solution/workaround to use these machines with current kernels is to override trip points, maybe the patch should really just be reverted... The question really is whether the vendors will all revert it and carry it as a patch or whether the main tree will accept reality on this one. Reverting it and adding a taint marker if you do it is much preferable I suspect to having every vendor revert this bogus if well meaning changeset. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 12:02:21PM +0100, Alan Cox wrote: Anyway, only solution/workaround to use these machines with current kernels is to override trip points, maybe the patch should really just be reverted... The question really is whether the vendors will all revert it and carry it as a patch or whether the main tree will accept reality on this one. Reverting it and adding a taint marker if you do it is much preferable I suspect to having every vendor revert this bogus if well meaning changeset. I strongly suspect that the vast majority[1] of hardware that needs the trip points changing works perfectly well under Windows, so it's likely to be papering over bugs in the kernel. It'd be nice if we fixed those rather than encouraging people to poke stuff into /proc, especially when doing so is guaranteed to break in really confusing ways with a lot of hardware. The firmware can reset the trip points at essentially arbitrary times and is well within its rights to expect the OS to actually pay attention to them. [1] Some hardware is simply broken. We don't carry phc just because some vendors put the wrong voltage values in their tables, either -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Thomas Renninger wrote: mainboard: AOpen i915GMm-hfs, AWARD BIOS cpu: Pentium-M 750 (0.8 to 1.86 MHz) openSuSE 10.2 with kernel 2.6.22.1 Is this a DELL laptop that gets throttled by 75% to throttling state 6 if 60 degrees are exceeded? No, it is a Pentium M desktop board.: Chipset i915GM, FSB 533MHz, max 2GB DDR2 RAM, 2 PCI and 1 16x PCI Express slots, serial, parallel, usb, firewire, 2x Marvel Gigabit Ethernet, Realtek ALC 880 sound, IDE, Intel SATA and SiI SATA Raid, FDC, DVI and VGA video out etc. Very low power consumption: ~40W to 65W for the whole system, except monitor. As 2.6.22 was shipped without, I think reverting is not a real option. Well, it would not be the first time to eliminate a regression by reverting a patch after it was accepted previously. Sanity checks that trip points only can get lowered (compared to initial provided ones) needs to be added. Len, Rui: For short-term can some But I _need_ to raise the unreasonably low passive trip point. We could decide to protect the innocent user by allowing write access to trip_points only after a previous echo I know what I am doing /proc/acpi/thermal_zone/THRM/enable_really_dangerous_options if we believe that this is a good idea ... Andi Kleen wrote: I don't think it's that unreasonable to require source code modifications for anything that can kill hardware. At least that raises the barrier a bit and hopefully ensures people think twice about it and then really only blame themselves if anything goes wrong. Andi, would the above be mechanism sufficiently safe for your taste? cu, Knut - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 2007-08-02 at 12:13 +0100, Matthew Garrett wrote: On Thu, Aug 02, 2007 at 12:02:21PM +0100, Alan Cox wrote: Anyway, only solution/workaround to use these machines with current kernels is to override trip points, maybe the patch should really just be reverted... The question really is whether the vendors will all revert it and carry it as a patch or whether the main tree will accept reality on this one. Reverting it and adding a taint marker if you do it is much preferable I suspect to having every vendor revert this bogus if well meaning changeset. I strongly suspect that the vast majority[1] of hardware that needs the trip points changing works perfectly well under Windows, so it's likely to be papering over bugs in the kernel. It'd be nice if we fixed those rather than encouraging people to poke stuff into /proc, Some arguments against that: - You cannot tell a customer: Wait for the kernel in half a year. This is the time it at least needs until a laptop got sold, the problem is found, a patch is written and checked in and finally hits the distribution. - You can also not backport fixes as ACPI patches mostly have the potential to break other machines/BIOSes - There also exist the policy to not fix up/workaround totally broken AML BIOS implementations - We do not need to and never will be able to copy or do the same Windows is doing - ... especially when doing so is guaranteed to break in really confusing ways with a lot of hardware. The firmware can reset the trip points at essentially arbitrary times and is well within its rights to expect the OS to actually pay attention to them. What the hell is so wrong with: Let the user override the trip points. If he does so, ignore thermal trip point updates from BIOS. Don't care for hysteresis BIOS implementations (these are the BIOS trip point updates). If user changes them, it's his fault, he doesn't need to... Make sure that trip points can only be lowered, compared to the initially fetched one from BIOS. This is neither confusing, nor dangerous in any way (beside the fact that the critical trip point might get dynamically lowered by BIOS, which is totally insane). Thomas [1] Some hardware is simply broken. We don't carry phc just because some vendors put the wrong voltage values in their tables, either - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 01:45:00PM +0200, Thomas Renninger wrote: On Thu, 2007-08-02 at 12:13 +0100, Matthew Garrett wrote: I strongly suspect that the vast majority[1] of hardware that needs the trip points changing works perfectly well under Windows, so it's likely to be papering over bugs in the kernel. It'd be nice if we fixed those rather than encouraging people to poke stuff into /proc, Some arguments against that: - You cannot tell a customer: Wait for the kernel in half a year. This is the time it at least needs until a laptop got sold, the problem is found, a patch is written and checked in and finally hits the distribution. We have to do so frequently. New hardware often exposes bugs in the kernel. - You can also not backport fixes as ACPI patches mostly have the potential to break other machines/BIOSes - There also exist the policy to not fix up/workaround totally broken AML BIOS implementations The policy has been to attempt to be bug-compatible with Windows whenever possible for some time now. - We do not need to and never will be able to copy or do the same Windows is doing Given that many vendors still only test against Windows, that's exactly what we need to do. especially when doing so is guaranteed to break in really confusing ways with a lot of hardware. The firmware can reset the trip points at essentially arbitrary times and is well within its rights to expect the OS to actually pay attention to them. What the hell is so wrong with: Let the user override the trip points. If he does so, ignore thermal trip point updates from BIOS. Don't care for hysteresis BIOS implementations (these are the BIOS trip point updates). No, that's not the only reason for notifications. Alteration in hardware state may also force a recalculation of trip point (adding a battery to a bay rather than a DVD drive may require the platform to be kept at a lower temperature) If user changes them, it's his fault, he doesn't need to... Make sure that trip points can only be lowered, compared to the initially fetched one from BIOS. Surely people want this functionality so that they can raise trip points? -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
I strongly suspect that the vast majority[1] of hardware that needs the trip points changing works perfectly well under Windows, so it's Windows as I understand it has vendor mechanisms to allow the bits shipped with the OS to override/ignore just about everything trip points included. Lots of hardware that requires fixups in Linux and just works in Windows is not Linux bugs but Windows magic .inf files and other registry gunge done by the machine vendor. We see this in ATA, in power management and elsewhere. Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 02:06:26PM +0200, Thomas Renninger wrote: On Thu, 2007-08-02 at 12:57 +0100, Matthew Garrett wrote: On Thu, Aug 02, 2007 at 12:59:47PM +0100, Alan Cox wrote: Windows as I understand it has vendor mechanisms to allow the bits shipped with the OS to override/ignore just about everything trip points included. Lots of hardware that requires fixups in Linux and just works in Windows is not Linux bugs but Windows magic .inf files and other registry gunge done by the machine vendor. We see this in ATA, in power management and elsewhere. I've seen no evidence that this happens with thermal trip points. WMI needed for fan control -- FSC Amilo M3438G http://bugzilla.kernel.org/show_bug.cgi?id=5670 That machine has no active thermal trip points, so I'm not sure how it's relevant here. By the sounds of the bug log, I suspect Linux just runs slightly hotter on the machine than Windows does - especially since the user isn't running the closed nvidia driver, so there's nothing to carry out any power management on the GPU. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 12:59:47PM +0100, Alan Cox wrote: I strongly suspect that the vast majority[1] of hardware that needs the trip points changing works perfectly well under Windows, so it's Windows as I understand it has vendor mechanisms to allow the bits shipped with the OS to override/ignore just about everything trip points included. Lots of hardware that requires fixups in Linux and just works in Windows is not Linux bugs but Windows magic .inf files and other registry gunge done by the machine vendor. We see this in ATA, in power management and elsewhere. I've seen no evidence that this happens with thermal trip points. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Set a taint flag, That's hardly any useful if the machine is dead afterwards. print a loud message Neither. You'll just end up with Linux destroyed my laptop headlines all over the internet and rightfully very annoyed users. Or have you forgotten the original Unix philosophy too ? The philosophy didn't include physically destroying hardware as far as I know. Here we had obviously-useful-to-you functionality which was taken away without, afaik, providing any alternative. I don't think it's that unreasonable to require source code modifications for anything that can kill hardware. As root you can erase the bios, We don't ship the devbios driver for good reasons. lock the hard disk with a random password, reflash your video card That all requires significant effort and custom software. It's not that we have a one liner echo destroy /sys/.../flash-bios. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Andi Kleen wrote: I don't think it's that unreasonable to require source code modifications for anything that can kill hardware. At least that raises the barrier a bit and hopefully ensures people think twice about it and then really only blame themselves if anything goes wrong. Andi, would the above be mechanism sufficiently safe for your taste? No. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 2007-08-02 at 12:57 +0100, Matthew Garrett wrote: On Thu, Aug 02, 2007 at 12:59:47PM +0100, Alan Cox wrote: I strongly suspect that the vast majority[1] of hardware that needs the trip points changing works perfectly well under Windows, so it's Windows as I understand it has vendor mechanisms to allow the bits shipped with the OS to override/ignore just about everything trip points included. Lots of hardware that requires fixups in Linux and just works in Windows is not Linux bugs but Windows magic .inf files and other registry gunge done by the machine vendor. We see this in ATA, in power management and elsewhere. I've seen no evidence that this happens with thermal trip points. WMI needed for fan control -- FSC Amilo M3438G http://bugzilla.kernel.org/show_bug.cgi?id=5670 Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 2007-08-02 at 12:56 +0100, Matthew Garrett wrote: On Thu, Aug 02, 2007 at 01:45:00PM +0200, Thomas Renninger wrote: On Thu, 2007-08-02 at 12:13 +0100, Matthew Garrett wrote: I strongly suspect that the vast majority[1] of hardware that needs the trip points changing works perfectly well under Windows, so it's likely to be papering over bugs in the kernel. It'd be nice if we fixed those rather than encouraging people to poke stuff into /proc, Some arguments against that: - You cannot tell a customer: Wait for the kernel in half a year. This is the time it at least needs until a laptop got sold, the problem is found, a patch is written and checked in and finally hits the distribution. We have to do so frequently. New hardware often exposes bugs in the kernel. And often we can provide a boot param or whatever, that makes it at least useable. - You can also not backport fixes as ACPI patches mostly have the potential to break other machines/BIOSes - There also exist the policy to not fix up/workaround totally broken AML BIOS implementations The policy has been to attempt to be bug-compatible with Windows whenever possible for some time now. *whenever possible* - We do not need to and never will be able to copy or do the same Windows is doing Given that many vendors still only test against Windows, that's exactly what we need to do. But we cannot (copy all windows (mis-)behavior). especially when doing so is guaranteed to break in really confusing ways with a lot of hardware. The firmware can reset the trip points at essentially arbitrary times and is well within its rights to expect the OS to actually pay attention to them. What the hell is so wrong with: Let the user override the trip points. If he does so, ignore thermal trip point updates from BIOS. Don't care for hysteresis BIOS implementations (these are the BIOS trip point updates). No, that's not the only reason for notifications. Alteration in hardware state may also force a recalculation of trip point (adding a battery to a bay rather than a DVD drive may require the platform to be kept at a lower temperature) I've seen no evidence that this happens..., but I see the point. If user changes them, it's his fault, he doesn't need to... Make sure that trip points can only be lowered, compared to the initially fetched one from BIOS. Surely people want this functionality so that they can raise trip points? For Adrian it would be enough to be able to lower them. Also being able to define a passive trip point (even if not provided by BIOS) could help a lot machines. What about at least: - Be able to override passive cooling trip point - If BIOS does not provide one, let user be able to define it This should already make a lot people happy. Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 02:35:18PM +0200, Thomas Renninger wrote: On Thu, 2007-08-02 at 13:15 +0100, Matthew Garrett wrote: That machine has no active thermal trip points, so I'm not sure how it's relevant here. From above: Windows as I understand it has vendor mechanisms to... Maybe thermal trip points are not influenced here, it's at least about thermal management and another prove that we cannot just try to copy Windows behavior, but need to provide workarounds wherever possible. There's absolutely no evidence in the bug log there that the user's problems are in any way due to Windows-specific code. The SetSilentMode stuff is an additional item of functionality that underclocks various bits of hardware, not one that's actually required for the platform to function correctly. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 02:42:19PM +0200, Thomas Renninger wrote: On Thu, 2007-08-02 at 12:56 +0100, Matthew Garrett wrote: The policy has been to attempt to be bug-compatible with Windows whenever possible for some time now. *whenever possible* But there's no evidence whatsoever that this is something we can't handle... No, that's not the only reason for notifications. Alteration in hardware state may also force a recalculation of trip point (adding a battery to a bay rather than a DVD drive may require the platform to be kept at a lower temperature) I've seen no evidence that this happens..., but I see the point. It's explicitly mentioned as one of the use cases for trip point alteration in the spec. Surely people want this functionality so that they can raise trip points? For Adrian it would be enough to be able to lower them. Which suggests that we're probably doing something wrong at some more fundamental level... Also being able to define a passive trip point (even if not provided by BIOS) could help a lot machines. I agree that being able to lower trip points is unlikely to result in hardware damage, but still think that it's likely to be papering over genuine bugs that we could fix properly. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Set a taint flag, That's hardly any useful if the machine is dead afterwards. It won't be the hardware will do a failsafe shutdown first. You'll just end up with Linux destroyed my laptop headlines all over the internet and rightfully very annoyed users. You have to systematically sit down and tweak your machine. The philosophy didn't include physically destroying hardware as far as I know. It most certainly did. With safety checks you could override. As root you can erase the bios, We don't ship the devbios driver for good reasons. Thats debatably a bad reason (the user space API is wrong thats all), and one thats totally inconsistent with some of the other drivers we do ship. lock the hard disk with a random password, reflash your video card That all requires significant effort and custom software. It's not that we have a one liner echo destroy /sys/.../flash-bios. Well you can do the hard disk one in one line of perl, the video card one in a small bit of C. And this merely makes the argument that raising the trip points should be harder. Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, 2007-08-02 at 13:15 +0100, Matthew Garrett wrote: On Thu, Aug 02, 2007 at 02:06:26PM +0200, Thomas Renninger wrote: On Thu, 2007-08-02 at 12:57 +0100, Matthew Garrett wrote: On Thu, Aug 02, 2007 at 12:59:47PM +0100, Alan Cox wrote: Windows as I understand it has vendor mechanisms to allow the bits shipped with the OS to override/ignore just about everything trip points included. Lots of hardware that requires fixups in Linux and just works in Windows is not Linux bugs but Windows magic .inf files and other registry gunge done by the machine vendor. We see this in ATA, in power management and elsewhere. I've seen no evidence that this happens with thermal trip points. WMI needed for fan control -- FSC Amilo M3438G http://bugzilla.kernel.org/show_bug.cgi?id=5670 That machine has no active thermal trip points, so I'm not sure how it's relevant here. From above: Windows as I understand it has vendor mechanisms to... Maybe thermal trip points are not influenced here, it's at least about thermal management and another prove that we cannot just try to copy Windows behavior, but need to provide workarounds wherever possible. Thomas By the sounds of the bug log, I suspect Linux just runs slightly hotter on the machine than Windows does - especially since the user isn't running the closed nvidia driver, so there's nothing to carry out any power management on the GPU. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 02:04:42PM +0100, Alan Cox wrote: Set a taint flag, That's hardly any useful if the machine is dead afterwards. It won't be the hardware will do a failsafe shutdown first. Not necessarily. At SUSE we had at least one broken laptop with wrong trip points. The machine ran very hot for some time and afterwards the hard disk was dead. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Andi, would the above be mechanism sufficiently safe for your taste? No. I don't beleve Andi's taste (or lack thereof) is relevant to this discussion. He's not for example explained why its better to force people to disable all the APCI power and thermal control on their system rather than adjust trip points. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Hi! I didn't understand the arguments either, actually. The issue is that you can actually kill hardware by setting this wrong. We've had such cases where trip point problems eventually lead to overheated laptops with hard disks dying etc. Actually, that was my machine. Omnibook xe3; BIOS provided trip points *did* kill the disk. At least I was able to work around it with writing to trip points. Yes, ACPI mandates emergency shutdown when critical+delta point is reached, *in hardware*. So this only endangers very broken machines, and it also fixes lot of them. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu 2007-08-02 15:16:22, Andi Kleen wrote: On Thu, Aug 02, 2007 at 02:04:42PM +0100, Alan Cox wrote: Set a taint flag, That's hardly any useful if the machine is dead afterwards. It won't be the hardware will do a failsafe shutdown first. Not necessarily. At SUSE we had at least one broken laptop with wrong trip points. The machine ran very hot for some time and afterwards the hard disk was dead. Yes, but it was original BIOS trip points that were wrong. And yes, its failsafe shutdown was too late. At least lowering the trip points would allow me to run it safely. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Hi! Well, it would not be the first time to eliminate a regression by reverting a patch after it was accepted previously. Sanity checks that trip points only can get lowered (compared to initial provided ones) needs to be added. Len, Rui: For short-term can some But I _need_ to raise the unreasonably low passive trip point. We could decide to protect the innocent user by allowing write access to trip_points only after a previous Actually, you should lower your active trip point, and keep cpu temp below 50C. echo I know what I am doing /proc/acpi/thermal_zone/THRM/enable_really_dangerous_options No... but patch that only permits lowering could be acceptable. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 03:57:54PM +, Pavel Machek wrote: On Thu 2007-08-02 15:16:22, Andi Kleen wrote: On Thu, Aug 02, 2007 at 02:04:42PM +0100, Alan Cox wrote: Set a taint flag, That's hardly any useful if the machine is dead afterwards. It won't be the hardware will do a failsafe shutdown first. Not necessarily. At SUSE we had at least one broken laptop with wrong trip points. The machine ran very hot for some time and afterwards the hard disk was dead. Yes, but it was original BIOS trip points that were wrong. And yes, its failsafe shutdown was too late. At least lowering the trip points would allow me to run it safely. I have no problem with lowering them (in fact I proposed this to Thomas as a possible solution at some point). Just rising is a bad idea. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thu, Aug 02, 2007 at 08:38:30PM +0200, Andi Kleen wrote: On Thu, Aug 02, 2007 at 03:57:54PM +, Pavel Machek wrote: Yes, but it was original BIOS trip points that were wrong. And yes, its failsafe shutdown was too late. At least lowering the trip points would allow me to run it safely. I have no problem with lowering them (in fact I proposed this to Thomas as a possible solution at some point). Just rising is a bad idea. Though for this to be reliable, you need to ignore any notifications that would raise the trip points while still paying attention to any that would lower them. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
Knut Petersen [EMAIL PROTECTED] writes: echo I know what I am doing /proc/acpi/thermal_zone/THRM/enable_really_dangerous_options There is a shorter version: $ su Password: # -- Krzysztof Halasa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thursday 02 August 2007 04:40, Knut Petersen wrote: Kernel 2.6.22 decreases performance by about 50% on my system. No, I do not like that. The reason is a broken BIOS, granted, but there was a perfect workaround in the kernel that has been dropped. mainboard: AOpen i915GMm-hfs, AWARD BIOS cpu: Pentium-M 750 (0.8 to 1.86 MHz) openSuSE 10.2 with kernel 2.6.22.1 The cpu fan can not be controled by linux kernel. The BIOS will switch on the cpu fan a bit above 50 deg. Celsius. The active and passive trip points both are set to 50 deg. Celsius. Temperature of the idle cpu at 800 Mhz: 34 to 42 deg. C. The BIOS never changes the trip points. Cpufreq does work perfectly. Previously there was the possibility to add something like echo 100:0:65:70:0 /proc/acpi/thermal_zone/THRM/trip_points echo 2 /proc/acpi/thermal_zone/THRM/polling_frequency echo ondemand /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor to e.g. /etc/init.d/boot.local. With 2.6.22 that solution does not exist any longer. Now the code in thermal.c slows down the cpu under load to prevent overheating. Kernel compile time increases from about 12 to 18 minutes. No, I don´t like that, nobody would. Thanks for the sighting, Knut! This regression is dramatic when put in the terms of 50% performance hit! I guess the good news is that thermal throttling is doing the job we are asking it to:-) The statement above regarding the existence of active trip points and the kernel not being able to control the fan are inconsistent with each other. Please open a sighting for this machine here: http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI vs. Power-Thermal and attach the output from acpidump, cat /proc/acpi/thermal_zone/*/* and assign it to [EMAIL PROTECTED] BTW. does the board boot and run properly with acpi=off? thanks, -Len - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 regression: thermal trip points
On Thursday 02 August 2007 05:45, Adrian Schröter wrote: On Thursday 02 August 2007 11:42:27 wrote Thomas Renninger: On Thu, 2007-08-02 at 10:40 +0200, Knut Petersen wrote: Hi everybody! Kernel 2.6.22 decreases performance by about 50% on my system. No, I do not like that. The reason is a broken BIOS, granted, but there was a perfect workaround in the kernel that has been dropped. mainboard: AOpen i915GMm-hfs, AWARD BIOS cpu: Pentium-M 750 (0.8 to 1.86 MHz) openSuSE 10.2 with kernel 2.6.22.1 Is this a DELL laptop that gets throttled by 75% to throttling state 6 if 60 degrees are exceeded? Adrian has such a machine..., no idea what is going on with that one, but only workaround to get any use out of this machine is to override at least the passive trip point. JFYI, there are plenty of these systems around, it was one out of four standard Novell modells. I am mabye just the first one who uses Factory on it, but expect more bugreports when 10.3 gets released ... That's very good news, Adrian. In the past all we had to go on was the memory of a machine that died several years ago. But if you've got a live failure, that is really valuable. Please go here http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI and submit a new sighting vs. Power-Thermal and attach the output from acpidump, cat /proc/acpi/thermal_zone/*/* and assign it to [EMAIL PROTECTED] thanks, -Len - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/