Hi, Sorry if this is a repeat. I have checked the faq.
I get this error 1 out of 20 times when starting heartbeat for the first
time.
I'm running version 2.99.3.
Here's my log.
Jul 9 14:28:45 (none) heartbeat: [760]: WARN: Core dumps could be lost if
multiple dumps occur.
Jul 9 14:28:45 (none) heartbeat: [760]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) f
or maximum supportability
Jul 9 14:28:45 (none) heartbeat: [760]: WARN: Consider setting
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supp
ortability
Jul 9 14:28:45 (none) heartbeat: [760]: info: Version 2 support: false
Jul 9 14:28:45 (none) heartbeat: [760]: WARN: Logging daemon is disabled
--enabling logging daemon is recommended
Jul 9 14:28:46 (none) heartbeat: [760]: info: **************************
Jul 9 14:28:46 (none) heartbeat: [760]: info: Configuration validated.
Starting heartbeat 2.99.3
Jul 9 14:28:46 (none) heartbeat: [761]: info: heartbeat: version 2.99.3
Jul 9 14:28:46 (none) heartbeat: [761]: info: Heartbeat generation:
1247147626
Jul 9 14:50:08 (none) last message repeated 61 timesUDP Broadcast
heartbeat started on port 694 (694) interface eth1
Jul 9 14:28:46 (none) heartbeat: [761]: info: glib: UDP Broadcast
heartbeat closed on port 694 interface eth1 - Status: 1
Jul 9 14:28:46 (none) heartbeat: [761]: info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth0
Jul 9 14:28:46 (none) heartbeat: [761]: info: glib: ucast: bound send
socket to device: eth0
Jul 9 14:28:46 (none) heartbeat: [761]: info: glib: ucast: bound receive
socket to device: eth0
Jul 9 14:28:46 (none) heartbeat: [761]: info: glib: ucast: started on
port 694 interface eth0 to 147.139.22.45
Jul 9 14:28:46 (none) heartbeat: [761]: info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth0
Jul 9 14:28:46 (none) heartbeat: [761]: info: glib: ucast: bound send
socket to device: eth0
Jul 9 14:28:46 (none) heartbeat: [761]: info: glib: ucast: bound receive
socket to device: eth0
Jul 9 14:28:46 (none) heartbeat: [761]: info: glib: ucast: started on
port 694 interface eth0 to 147.139.22.44
Jul 9 14:28:46 (none) heartbeat: [761]: info: glib: ping heartbeat
started.
Jul 9 14:28:46 (none) heartbeat: [761]: info: G_main_add_TriggerHandler:
Added signal manual handler
Jul 9 14:28:46 (none) heartbeat: [761]: info: G_main_add_TriggerHandler:
Added signal manual handler
Jul 9 14:28:46 (none) heartbeat: [761]: info: G_main_add_SignalHandler:
Added signal handler for signal 17
Jul 9 14:28:46 (none) heartbeat: [761]: info: Local status now set to:
'up'
Jul 9 14:28:47 (none) heartbeat: [761]: info: Link
147.139.16.250:147.139.16.250 up.
Jul 9 14:28:47 (none) heartbeat: [761]: info: Status update for node
147.139.16.250: status ping
Jul 9 14:28:47 (none) heartbeat: [761]: info: Link ems2:eth1 up.
Jul 9 14:28:48 (none) heartbeat: [764]: ERROR: fifo_child: here???
Jul 9 14:29:06 (none) last message repeated 18 times
Jul 9 14:29:06 (none) heartbeat: [761]: WARN: node ems1: is dead
Jul 9 14:29:06 (none) heartbeat: [761]: info: Comm_now_up(): updating
status to active
Jul 9 14:29:06 (none) heartbeat: [761]: info: Local status now set to:
'active'
Jul 9 14:29:06 (none) heartbeat: [761]: info: Starting child client
"/usr/lib/heartbeat/ipfail" (6343,65)
Jul 9 14:29:06 (none) heartbeat: [761]: WARN: No STONITH device
configured.
Jul 9 14:29:06 (none) heartbeat: [761]: WARN: Shared disks are not
protected.
Jul 9 14:29:06 (none) heartbeat: [761]: info: Resources being acquired
from ems1.
Jul 9 14:29:06 (none) heartbeat: [775]: info: Starting
"/usr/lib/heartbeat/ipfail" as uid 6343 gid 65 (pid 775)
Jul 9 14:29:07 (none) heartbeat: [764]: WARN: ha_msg_add_nv_depth: line
doesn't contain '='
Jul 9 14:29:07 (none) heartbeat: [764]: info: >>>
Jul 9 14:29:07 (none) heartbeat: [764]: ERROR: NV failure (msgfromsteam):
[>>> ]
Jul 9 14:29:07 (none) heartbeat: [764]: ERROR: fifo_child: here???
Jul 9 14:29:07 (none) heartbeat: [777]: info: Local Resource acquisition
completed.
Jul 9 14:29:07 (none) heartbeat: [761]: info: Initial resource
acquisition complete (T_RESOURCES(us))
Jul 9 14:29:08 (none) heartbeat: [764]: ERROR: fifo_child: here???
Jul 9 14:29:16 (none) last message repeated 8 times
Jul 9 14:29:17 (none) heartbeat: [761]: info: Local Resource acquisition
completed. (none)
Jul 9 14:29:17 (none) heartbeat: [761]: info: local resource transition
completed.
Jul 9 14:29:17 (none) heartbeat: [764]: ERROR: fifo_child: here???
Is this a know problem or is it me?
I did add some debug in function fifo_child(), as shown below.
/* Make sure we check for death of parent every so often... */
for (;;) {
setmsalarm(1000L);
msg = msgfromstream(fifo);
setmsalarm(0L);
hb_check_mcp_alive();
hb_signal_process_pending();
if (msg) {
IPC_Message* m;
if (DEBUGDETAILS) {
cl_log(LOG_DEBUG, "fifo_child message:");
cl_log_message(LOG_DEBUG, msg);
}
m = hamsg2ipcmsg(msg, chan);
if (m) {
/* Send frees "m" "at the right time" */
chan->ops->send(chan, m);
hb_check_mcp_alive();
hb_signal_process_pending();
chan->ops->waitout(chan);
hb_check_mcp_alive();
hb_signal_process_pending();
}
ha_msg_del(msg);
msg = NULL;
}else if (feof(fifo)) {
cl_log(LOG_ERR, "fifo_child: EOF on FIFO");
if (ANYDEBUG) {
return_to_orig_privs();
cl_log(LOG_DEBUG
, "fifo_child: EOF on FIFO");
}
hb_check_mcp_alive();
exit(2);
} else {
cl_log(LOG_ERR, "fifo_child: here???");
}
cl_cpu_limit_update();
cl_realtime_malloc_check();
hb_check_mcp_alive();
hb_signal_process_pending();
}
/*notreached*/
Many Thanks,
Ken
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems