Re: [PATCH] Hold reference to device_node during EEH event handling

Mike Mason Thu, 16 Jul 2009 09:39:51 -0700

Michael Ellerman wrote:

On Wed, 2009-07-15 at 14:43 -0700, Mike Mason wrote:

This patch increments the device_node reference counter when an EEH
error occurs and decrements the counter when the event has been
handled.  This is to prevent the device_node from being released until
eeh_event_handler() has had a chance to deal with the event.  We've
seen cases where the device_node is released too soon when an EEH
event occurs during a dlpar remove, causing the event handler to
attempt to access bad memory locations.


Please review and let me know of any concerns.


Taking a reference sounds sane, but ...

Signed-off-by: Mike Mason <mm...@us.ibm.com>

--- a/arch/powerpc/platforms/pseries/eeh_event.c        2008-10-09 
15:13:53.000000000 -0700
+++ b/arch/powerpc/platforms/pseries/eeh_event.c        2009-07-14 
14:14:00.000000000 -0700
@@ -75,6 +75,14 @@ static int eeh_event_handler(void * dumm
        if (event == NULL)
                return 0;

+ /* EEH holds a reference to the device_node, so if it

+        * equals 1 it's no longer valid and the event should
+        * be ignored */
+       if (atomic_read(&event->dn->kref.refcount) == 1) {
+               of_node_put(event->dn);
+               return 0;
+       }


That's really gross :)


Agreed.  I'll look for another way to determine if device is gone and the event 
should be ignored.  Suggestions are welcome :-)


And what happens if the refcount goes to 1 just after the check? ie.
here.

        /* Serialize processing of EEH events */
        mutex_lock(&eeh_event_mutex);
        eeh_mark_slot(event->dn, EEH_MODE_RECOVERING);



cheers


_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] Hold reference to device_node during EEH event handling

Reply via email to