https://bugzilla.kernel.org/show_bug.cgi?id=15946
Xavier Hourcade <public....@xapaho.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #73296|0 |1 is obsolete| | --- Comment #83 from Xavier Hourcade <public....@xapaho.com> 2012-05-19 20:36:23 --- Created an attachment (id=73332) --> (https://bugzilla.kernel.org/attachment.cgi?id=73332) acpi readings, dump, dmesg and heat watchdog (shutdown on May 16th 23:15) An unattended shutdown did occur here, again, on May 16th 23:15, first time using this kernel (latest stable at fedora, as always): kernel-3.3.5-2.fc16.x86_64 At the time of the event, system was "only" playing a flash video full screen (using firefox @fedora, flash-plugin @adobe, kmod-nvidia @rpmfusion, using external display only at 1680x1050 -- nothing huge). I booted again immediately after that, and since then, well, "the devil has been back on my side" : system has been running 24/7 with no issues, including extremely heavy loads during several consecutive hours (I pushed it, really). What surprises me, is that, at the time of the shutdown : - my "heat.sh" script *had* the time to detect and call logger - it did log "crit" events -- not just once, but even *twice* - again, there is a 0.5s delay to estimate the CPU average load, hence my "crit" level was reached for more than 1 second (and less than 1.5s) before system was powered off. - syslog/kernel however... did *not* log anything ! "wt...?" :) Please see the grep'ed /var/log/messages (at the end of the attached report): Line 7010 : * I had set "notice" event level to the following temperature ranges : - CPU: 74..77 ºC (as read by acpi -t, i.e. /sys/class/thermal) - GPU: 71..73 ºC (as read by nvidia-smi -q | sed) * I had set a 3 seconds pause when temperatures were at my "yawn" level a reduced 2 seconds pause when "noticed" level was reached a reduced 1 and 0 seconds for "warning" and "crit", respectively (plus the 0.5s laps to measure CPU average usage, for every levels) Lines 7012 through 7026 : * Since the flash video started to be played full screen, "notice" was reached on a regular basis (up to 6 times per minute, maximum), but never persistently (i.e. temperature did drop again to "yawn", every time) Lines 7027 though 7039 : * All of a sudden (that means within a 3.5 seconds time laps, maximum, after the preceding "yawn" event which did not have to log anything), then event level switched to "crit" (no "notice" or "warning") CPU temperature holding the 127 magic value once again * System was under no particular load (for such usage) at the time of the bug. Even GPU temperature was, well, rather low... Does this helps a little ? This bug -- and above everything, the absence of syslog/kernel event -- really sucks :/ Please advise, thanks. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. You are watching the assignee of the bug. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ acpi-bugzilla mailing list acpi-bugzilla@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/acpi-bugzilla