http://bugzilla.kernel.org/show_bug.cgi?id=9772

           Summary: 2.6.24-rc8 + patches:  CPU hot removal while CPU is
                    online leaves system in bad state
           Product: ACPI
           Version: 2.5
     KernelVersion: 2.6.24-rc8 + patch for bug 2884
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Config-Hotplug
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


Latest working kernel version: Unknown
Earliest failing kernel version: Unknown
Distribution: sles10
Hardware Environment: x86_64
Software Environment:
Problem Description:
I've applied the patches attached to bug 2884 in kernel bugzilla so that I can
work around the deadlock that occurs when you write to the eject node for an
ACPI object.

The kernel doesn't properly handle the case of ejecting a CPU that is still
online.  Doing so leaves the CPU online.  It still shows up in /proc/cpuinfo. 
And the /sys/devices/system/cpuX node still exists.  But the kernel goes ahead
and calls the ACPI eject method anyway, which means the hardware is free to be
removed at that point.  And it cleans up the /sys nodes for the ACPI tree, so
it's not possible to request another ejection.  This leads to system
instability as things gradually hang as they attempt to interact with the CPU
that has gone away.

Writing to the "eject" node for an online CPU should probably result in the
ejection request being ignored because the CPU is still online.  The /sys/ tree
node for the ACPI device should be left intact, so another ejection can be
requested after the CPU has been taken offline.   The write to the eject sys
node should probably fail with an error, but that's optional.  

Alternatively, the kernel could automatically offline an online CPU before
ejecting it.  I think it's a matter of taste which behavior you prefer.

acpi_processor_remove looks like it returns -EINVAL if the CPU is online (see
acpi_processor_handle_eject), but I don't think this return value is ever
looked at, so the eject request isn't ever stopped.  At least as far as I can
see.

Steps to reproduce:
echo 1 > /sys/devices/LNXSYSTM:00/device:00/ACPI0007:01/eject while CPU is
still online.  Watch your shell hang.  Wait a few minutes and watch the whole
system hang.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
acpi-bugzilla mailing list
acpi-bugzilla@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/acpi-bugzilla

Reply via email to