On Fri, Nov 30, 2007 at 05:16:38PM +1100, Amos Shapira wrote: > On 30/11/2007, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > > > > Hi, > > > > On Thu, Nov 29, 2007 at 05:23:33PM +0000, Amos Shapira wrote: > > > On 29/11/2007, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > > > > > > > > Hi, > > > > > > > > On Thu, Nov 29, 2007 at 10:25:47AM +0000, Amos Shapira wrote: > > > > > On 29/11/2007, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > > > > > > Yes, very much so. For some reason the MCP (master control > > > > > > process) doesn't start the rest of the programs which are doing > > > > > > the real work. I really can't say why. Can you please attach the > > > > > > logs from this node? > > > > > > > > > > A pstree(1) on the better node visualizes the responsibility of > > > > > starting the programs pretty vividly: > > > > > > > > > > |-heartbeat,18449 > > > > > | |-attrd,18477 > > > > > | |-ccm,18473 > > > > > | |-cib,18474 > > > > > | |-crmd,18478 > > > > > | | |-pengine,18505 > > > > > | | `-tengine,18504 > > > > > | |-heartbeat,18452 > > > > > | |-heartbeat,18453 > > > > > | |-heartbeat,18454 > > > > > | |-heartbeat,18455 > > > > > | |-heartbeat,18456 > > > > > | |-lrmd,18475 -r > > > > > | |-mgmtd,18479 -v > > > > > | `-stonithd,18476 > > > > > > > > > > Here they are again (from tonight): > > > > > > > > > > 1 heartbeat[17481]: 2007/11/29_07:12:40 WARN: heartbeat: udp > > > > > port 695 reserved for service "ieee-mms-ssl". > > > > > 2 heartbeat[17481]: 2007/11/29_07:12:40 info: Version 2 > > support: > > > > yes > > > > > 3 heartbeat[17481]: 2007/11/29_07:12:40 WARN: File > > > > > /etc/ha.d/haresources exists. > > > > > 4 heartbeat[17481]: 2007/11/29_07:12:40 WARN: This file is not > > > > > used because crm is enabled > > > > > 5 heartbeat[17481]: 2007/11/29_07:12:40 WARN: Logging daemon > > is > > > > > disabled --enabling logging daemon is recommended > > > > > 6 heartbeat[17481]: 2007/11/29_07:12:40 info: > > > > ************************** > > > > > 7 heartbeat[17481]: 2007/11/29_07:12:40 info: Configuration > > > > > validated. Starting heartbeat 2.1.2 > > > > > 8 heartbeat[17482]: 2007/11/29_07:12:40 info: heartbeat: > > version > > > > 2.1.2 > > > > > 9 heartbeat[17482]: 2007/11/29_07:12:40 info: Heartbeat > > > > > generation: 1196102397 > > > > > 10 heartbeat[17482]: 2007/11/29_07:12:40 info: > > > > > G_main_add_TriggerHandler: Added signal manual handler > > > > > 11 heartbeat[17482]: 2007/11/29_07:12:40 info: > > > > > G_main_add_TriggerHandler: Added signal manual handler > > > > > 12 heartbeat[17482]: 2007/11/29_07:12:40 info: Removing > > > > > /var/run/heartbeat/rsctmp failed, recreating. > > > > > 13 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: > > write > > > > > socket priority set to IPTOS_LOWDELAY on eth0 > > > > > 14 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: > > bound > > > > > send socket to device: eth0 > > > > > 15 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: > > bound > > > > > receive socket to device: eth0 > > > > > 16 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: > > > > > started on port 695 interface eth0 to 192.168.0.248 > > > > > 17 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: > > write > > > > > socket priority set to IPTOS_LOWDELAY on eth0 > > > > > 18 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: > > bound > > > > > send socket to device: eth0 > > > > > 19 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: > > bound > > > > > receive socket to device: eth0 > > > > > 20 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: > > > > > started on port 695 interface eth0 to 192.168.0.249 > > > > > 21 heartbeat[17482]: 2007/11/29_07:12:40 info: > > > > > G_main_add_SignalHandler: Added signal handler for signal 17 > > > > > 22 heartbeat[17482]: 2007/11/29_07:12:40 info: Local status now > > > > > set to: 'up' > > > > > 23 heartbeat[17482]: 2007/11/29_07:12:41 info: Link > > > > > drbd01.test.spammatters.local:eth0 up. > > > > > 24 heartbeat[17482]: 2007/11/29_07:12:41 info: Status update > > for > > > > > node drbd01.test.spammatters.local: status up > > > > > 25 heartbeat[17482]: 2007/11/29_07:13:45 info: all clients are > > now > > > > paused > > > > > 26 heartbeat[17482]: 2007/11/29_07:13:45 debug: hist->ackseq =0 > > > > > 27 heartbeat[17482]: 2007/11/29_07:13:45 debug: hist->lowseq > > =0, > > > > > hist->hiseq=101 > > > > > 28 heartbeat[17482]: 2007/11/29_07:13:45 debug: expecting from > > > > > drbd01.test.spammatters.local > > > > > 29 heartbeat[17482]: 2007/11/29_07:13:45 debug: it's ackseq=0 > > > > > > > > heartbeat is getting no packet acknowledgements from drbd01. It > > > > must be a communication problem. Looks like drbd02 doesn't see > > > > packets coming from drbd01, assuming that it's sending them, > > > > which it does if there are no errors reported in drbd01. > > > > > > > > > Wouldn't this be the case if crmd crashes? Could this be related to > > "stonith > > > -h" seg-faulting and the missing processes (crmd, cib, attrd, ccm, lrmd, > > > mgmtd, stonithd) which I can see on the other node? > > > > No. There's an IPC layer which is used by heartbeat (the process) > > only. If that doesn't work, it won't start other programs. > > > I did some more experimentation - I installed a third machine identical to > the second one but still get the same results.
Then perhaps the problem is on the good host. Did you try to make a cluster of only the second and the third host? > One thing that I managed to change (on both the new machine and the previous > "secondary") is that by moving aside the content of > /usr/lib64/stonith/plugins/stonith2 and leaving only the "null" plugin in > there I could get rid of the "stonith -h" segmentation fault (and I don't > have any of the devices these plugins talk to anyway). The stonith program problem is definitely annoying, but it is not going to influence your cluster in any way. > But still I don't see crmd and friends on any machine except for the > primary. > > Anyway, you would definitely see error messages if a program > > can't be started. > > > Where should I look for it? The init.d script forward most everything into > /dev/null. It's nothing to do with the init script. The heartbeat MCP (master control process) starts all other processes itself. The default syslog facility is daemon (2.0.x releases had local7). Thanks, Dejan > > I'll try again with the default port, in case this matters. > > > > No, it shouldn't matter. > > > Apparently it didn't matter :^) > > Thanks very much for your time. > > --Amos > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
