On 29/11/2007, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > > Hi, > > On Thu, Nov 29, 2007 at 10:25:47AM +0000, Amos Shapira wrote: > > On 29/11/2007, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > > > Yes, very much so. For some reason the MCP (master control > > > process) doesn't start the rest of the programs which are doing > > > the real work. I really can't say why. Can you please attach the > > > logs from this node? > > > > A pstree(1) on the better node visualizes the responsibility of > > starting the programs pretty vividly: > > > > |-heartbeat,18449 > > | |-attrd,18477 > > | |-ccm,18473 > > | |-cib,18474 > > | |-crmd,18478 > > | | |-pengine,18505 > > | | `-tengine,18504 > > | |-heartbeat,18452 > > | |-heartbeat,18453 > > | |-heartbeat,18454 > > | |-heartbeat,18455 > > | |-heartbeat,18456 > > | |-lrmd,18475 -r > > | |-mgmtd,18479 -v > > | `-stonithd,18476 > > > > Here they are again (from tonight): > > > > 1 heartbeat[17481]: 2007/11/29_07:12:40 WARN: heartbeat: udp > > port 695 reserved for service "ieee-mms-ssl". > > 2 heartbeat[17481]: 2007/11/29_07:12:40 info: Version 2 support: > yes > > 3 heartbeat[17481]: 2007/11/29_07:12:40 WARN: File > > /etc/ha.d/haresources exists. > > 4 heartbeat[17481]: 2007/11/29_07:12:40 WARN: This file is not > > used because crm is enabled > > 5 heartbeat[17481]: 2007/11/29_07:12:40 WARN: Logging daemon is > > disabled --enabling logging daemon is recommended > > 6 heartbeat[17481]: 2007/11/29_07:12:40 info: > ************************** > > 7 heartbeat[17481]: 2007/11/29_07:12:40 info: Configuration > > validated. Starting heartbeat 2.1.2 > > 8 heartbeat[17482]: 2007/11/29_07:12:40 info: heartbeat: version > 2.1.2 > > 9 heartbeat[17482]: 2007/11/29_07:12:40 info: Heartbeat > > generation: 1196102397 > > 10 heartbeat[17482]: 2007/11/29_07:12:40 info: > > G_main_add_TriggerHandler: Added signal manual handler > > 11 heartbeat[17482]: 2007/11/29_07:12:40 info: > > G_main_add_TriggerHandler: Added signal manual handler > > 12 heartbeat[17482]: 2007/11/29_07:12:40 info: Removing > > /var/run/heartbeat/rsctmp failed, recreating. > > 13 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: write > > socket priority set to IPTOS_LOWDELAY on eth0 > > 14 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: bound > > send socket to device: eth0 > > 15 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: bound > > receive socket to device: eth0 > > 16 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: > > started on port 695 interface eth0 to 192.168.0.248 > > 17 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: write > > socket priority set to IPTOS_LOWDELAY on eth0 > > 18 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: bound > > send socket to device: eth0 > > 19 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: bound > > receive socket to device: eth0 > > 20 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: > > started on port 695 interface eth0 to 192.168.0.249 > > 21 heartbeat[17482]: 2007/11/29_07:12:40 info: > > G_main_add_SignalHandler: Added signal handler for signal 17 > > 22 heartbeat[17482]: 2007/11/29_07:12:40 info: Local status now > > set to: 'up' > > 23 heartbeat[17482]: 2007/11/29_07:12:41 info: Link > > drbd01.test.spammatters.local:eth0 up. > > 24 heartbeat[17482]: 2007/11/29_07:12:41 info: Status update for > > node drbd01.test.spammatters.local: status up > > 25 heartbeat[17482]: 2007/11/29_07:13:45 info: all clients are now > paused > > 26 heartbeat[17482]: 2007/11/29_07:13:45 debug: hist->ackseq =0 > > 27 heartbeat[17482]: 2007/11/29_07:13:45 debug: hist->lowseq =0, > > hist->hiseq=101 > > 28 heartbeat[17482]: 2007/11/29_07:13:45 debug: expecting from > > drbd01.test.spammatters.local > > 29 heartbeat[17482]: 2007/11/29_07:13:45 debug: it's ackseq=0 > > heartbeat is getting no packet acknowledgements from drbd01. It > must be a communication problem. Looks like drbd02 doesn't see > packets coming from drbd01, assuming that it's sending them, > which it does if there are no errors reported in drbd01.
Wouldn't this be the case if crmd crashes? Could this be related to "stonith -h" seg-faulting and the missing processes (crmd, cib, attrd, ccm, lrmd, mgmtd, stonithd) which I can see on the other node? I'll try again with the default port, in case this matters. Thanks. --Amos _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
