On Thu, 6 Apr 2006, Alan Robertson wrote:

> Andrew Beekhof wrote:
> > On 3/8/06, Joachim Banzhaf (compuserve) <[EMAIL PROTECTED]> wrote:
> >> Am Dienstag, 7. M?rz 2006 18:08 schrieb Andrew Beekhof:
> >>> On 3/1/06, David Lee <[EMAIL PROTECTED]> wrote:
> >>>> On Wed, 1 Mar 2006, mkinikoglu wrote:
> >>>>> i setup linux-ha to two solaris boxes. (5.9 sparc). when i start
> >>>>> heartbeat i got these errors,
> >>>>> what does it mean return code 139?
> >>>> The meanings of such code, and the use of "crmd" are not my particular
> >>>> area.
> >>> Even with the crmd being my area, 139 still doesnt mean anything to me.
> >>> Were there any logs from the crmd and or cib?
> >> I guess crmd received signal 11 (139 - 128).
> >
> > In which case there should definitely be a core file... but Heartbeat
> > will normally indicate that.  Odd.
>
> This is Solaris.  Maybe core files are disabled on his machine?

(Resurrecting a thread from three weeks ago)

The original user's problem (or something very like it) has now also
occured for me, so I've taken a deeper look.

(BTW: yes, it did drop a core file... nice.)


The gdb traceback is:
------------------------------
#0  0xfefb44e4 in strlen () from /usr/lib/libc.so.1
(gdb) where
#0  0xfefb44e4 in strlen () from /usr/lib/libc.so.1
#1  0xff006c30 in _doprnt () from /usr/lib/libc.so.1
#2  0xff008ca0 in vsnprintf () from /usr/lib/libc.so.1
#3  0xff36f954 in cl_log (priority=7,
    fmt=0x1d848 "recv msg %s from %s, status:%s")
    at ../../../lib/clplumbing/cl_log.c:584
#4  0x00013028 in ccm_control_process (info=0x3aaa0, hb=0x32b70)
    at ../../../membership/ccm/ccm.c:133
#5  0xff3697e0 in G_CH_dispatch_int (source=0x371e8, callback=0, user_data=0x0)
    at ../../../lib/clplumbing/GSource.c:610
#6  0xff244220 in g_main_dispatch () from /opt/csw/lib/libglib-2.0.so.0
#7  0xff245ad8 in g_main_context_dispatch () from /opt/csw/lib/libglib-2.0.so.0
#8  0xff246150 in g_main_context_iterate () from /opt/csw/lib/libglib-2.0.so.0
#9  0xff246ac8 in g_main_loop_run () from /opt/csw/lib/libglib-2.0.so.0
#10 0x00015d14 in main (argc=1, argv=0xffbff9bc)
    at ../../../membership/ccm/ccmmain.c:287
(gdb)
------------------------------


In "membership/ccm/ccm.c" (gdb frame #4 above) the code is:
------------------------------
                type = ha_msg_value(msg, F_TYPE);
                orig = ha_msg_value(msg, F_ORIG);
                status = ha_msg_value(msg, F_STATUS);

                ccm_debug(LOG_DEBUG, "recv msg %s from %s, status:%s"
                ,       type, orig, status);
------------------------------


Looking at the values:
------------------------------
(gdb) print type
$3 = 0x39da8 "resource"
(gdb) print orig
$4 = 0x36928 "shiel"
(gdb) print status
$5 = 0x0
(gdb)
------------------------------


So that's the problem: calling a "printf"-like routine with a null pointer
(variable "status") for a "%s" value.  A null "%s" is technically illegal.

(Now it may be that some OS implementations of "vsnprintf()" etc. try to
be "helpful" and to tolerate this, but this simply masks a lurking
portability problem.)

A quick fix would be adjust this calling code in "membership/ccm/ccm.c" to
convert a null-pointer to a pointer-to-null.  But is this the best
solution?

Should "status" ever to a null-pointer?

What other occurences may lurk?

Etc.

Advice welcome!


-- 

:  David Lee                                I.T. Service          :
:  Senior Systems Programmer                Computer Centre       :
:                                           Durham University     :
:  http://www.dur.ac.uk/t.d.lee/            South Road            :
:                                           Durham DH1 3LE        :
:  Phone: +44 191 334 2752                  U.K.                  :
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to