On Tue, Apr 22, 2008 at 1:52 PM, Keisuke MORI <[EMAIL PROTECTED]> wrote:
> "Andrew Beekhof" <[EMAIL PROTECTED]> writes:
>  (snip)
>
> >>  Here's my observation:
>  >>
>  >>   - An element of pending_ops is removed at lrm.c:L497
>  >>   - It is called inside from g_has_table_foreach() at L1475
>  >>   - This is violating the usage of g_has_table_foreach() according
>  >>    to the glib manual.
>  >>   - Therefore the iteration can not proceed correctly and would
>  >>    try to refer to a removed element.
>  >
>  > Turns out that the Stateful resource in CTS was never getting promoted.
>  > Once I fixed this, I was able to trigger the bug too (in the last few 
> minutes).
>
>  A weird thing is that, it is not reproducable on every environments.
>
>  As far as we've tested:
>   - it _always_ happens on a RedHat 4 environment.
>   - it has _never_ happened on a RedHat 5 environment.
>
>  I'm not sure if it's the only difference but
>  possibly the difference of the glib versions may be related to
>  the behavior.

It a very specific timing of resource actions - specifically resource
demotion - that is needed to trigger the code path you identified.

Its more likely the RHEL5 was just "lucky".  That or its a common
mistake and they've taken measures to avoid it.

>  > Thanks for your diagnosis and the patch, you've certainly saved me some 
> time :-)
>  >
>  >>
>  >>  http://hg.linux-ha.org/lha-2.1/annotate/333aef5bd4ed/crm/crmd/lrm.c
>  >>  (...)
>  >>  946             /* not doing this will block the node from shutting down 
> */
>  >>  947             g_hash_table_remove(pending_ops, key);
>  >>  (...)
>  >>  1475            g_hash_table_foreach(pending_ops, 
> stop_recurring_action_by_rsc, rsc);
>  >>
>  >>  
> http://library.gnome.org/devel/glib/stable/glib-Hash-Tables.html#g-hash-table-foreach
>  >>  (...)
>  >>  The hash table may not be modified while iterating over it (you can't 
> add/remove items).
>  >>
>  >>
>  >>  I also attached my suggested patch, although I can not guarantee
>  >>  the correctness but just to show you the idea.
>  >>
>  >>  Thanks,
>
>  --
>  Keisuke MORI
>  NTT DATA Intellilink Corporation
>  _______________________________________________________
>  Linux-HA-Dev: [email protected]
>  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>  Home Page: http://linux-ha.org/
>
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to