On Mon, Oct 25, 2010 at 08:21:26PM +0900, Keisuke MORI wrote:
> Hi,
>
> The recent heartbeat on the tip would cause an assertion fail in
> pacemaker-1.0 and generate a core:
> {{{
> Oct 25 17:15:08 srv02 cib: [31333]: ERROR: crm_abort:
> crm_glib_handler: Forked child 31338 to record non-fatal assert at
> utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed
> Oct 25 17:15:08 srv02 cib: [31333]: ERROR: crm_abort:
> crm_glib_handler: Forked child 31339 to record non-fatal assert at
> utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed
> Oct 25 17:15:11 srv02 crmd: [31337]: ERROR: crm_abort:
> crm_glib_handler: Forked child 31341 to record non-fatal assert at
> utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed
> Oct 25 17:15:11 srv02 crmd: [31337]: ERROR: crm_abort:
> crm_glib_handler: Forked child 31342 to record non-fatal assert at
> utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed
> }}}
>
>
> This seems introduced by the following changeset:
> http://hg.linux-ha.org/dev/rev/231b0b8555be
>
> The stack trace and my suggested patch are attached.
>
> The changeset in question had changed to use get_next_random() here
> which eventually calls g_main_loop_is_running() but it may fail
> because g_main_loop is not initialized yet in cib/crmd.
>
> My suggested patch would just revert the old behavior but only changes
> the delay as 50ms.
I don't care for the "get_more_random()" stuff and
keeping 100 "random" values prepared for get_next_random,
that is probably just academic sugar, anyways.
If it does not work, we throw it all out, or fix it.
I object to calling srand many times.
Actually we should only call it once,
we still call it in too many places.
I found the get_next_random() function to apparently properly wrap
around a "static int inityet" and do the srand only once,
so I just used it.
Would it help to call g_main_loop_new() earlier?
Can we more cleanly catch the "no GMainLoop there yet" in
get_more_random()?
Should we just drop get_next_random() from cl_rand_from_interval?
Or drop it altogether along with get_more_random and its static
array -- it's not as if generating random numbers was performance
critical in any way, is it.
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/