https://bugzilla.kernel.org/show_bug.cgi?id=15946


Xavier Hourcade <public....@xapaho.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #73296|0                           |1
        is obsolete|                            |




--- Comment #83 from Xavier Hourcade <public....@xapaho.com>  2012-05-19 
20:36:23 ---
Created an attachment (id=73332)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=73332)
acpi readings, dump, dmesg and heat watchdog (shutdown on May 16th 23:15)

An unattended shutdown did occur here, again, on May 16th 23:15, first time
using this kernel (latest stable at fedora, as always):

  kernel-3.3.5-2.fc16.x86_64

At the time of the event, system was "only" playing a flash video full screen
(using firefox @fedora, flash-plugin @adobe, kmod-nvidia @rpmfusion, using
external display only at 1680x1050 -- nothing huge).

I booted again immediately after that, and since then, well, "the devil has
been back on my side" : system has been running 24/7 with no issues, including
extremely heavy loads during several consecutive hours (I pushed it, really).


What surprises me, is that, at the time of the shutdown :
- my "heat.sh" script *had* the time to detect and call logger
- it did log "crit" events -- not just once, but even *twice*
- again, there is a 0.5s delay to estimate the CPU average load,
  hence my "crit" level was reached for more than 1 second (and less than 1.5s)
  before system was powered off.
- syslog/kernel however... did *not* log anything !

"wt...?" :)


Please see the grep'ed /var/log/messages (at the end of the attached report):

Line 7010 :

* I had set "notice" event level to the following temperature ranges :
  - CPU: 74..77 ºC (as read by acpi -t, i.e. /sys/class/thermal)
  - GPU: 71..73 ºC (as read by nvidia-smi -q | sed)

* I had set a 3 seconds pause when temperatures were at my "yawn" level
  a reduced 2 seconds pause when "noticed" level was reached
  a reduced 1 and 0 seconds for "warning" and "crit", respectively
  (plus the 0.5s laps to measure CPU average usage, for every levels)

Lines 7012 through 7026 :

* Since the flash video started to be played full screen,
  "notice" was reached on a regular basis (up to 6 times per minute, maximum),
  but never persistently (i.e. temperature did drop again to "yawn", every
time)

Lines 7027 though 7039 :

* All of a sudden (that means within a 3.5 seconds time laps, maximum,
  after the preceding "yawn" event which did not have to log anything),
  then event level switched to "crit" (no "notice" or "warning")
  CPU temperature holding the 127 magic value once again

* System was under no particular load (for such usage) at the time of the bug.
  Even GPU temperature was, well, rather low...


Does this helps a little ? This bug -- and above everything, the absence of
syslog/kernel event -- really sucks :/

Please advise, thanks.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
You are watching the assignee of the bug.
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
acpi-bugzilla mailing list
acpi-bugzilla@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/acpi-bugzilla

Reply via email to