Hi,

On Thu, Nov 29, 2007 at 10:25:47AM +0000, Amos Shapira wrote:
> On 29/11/2007, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
> > Yes, very much so. For some reason the MCP (master control
> > process) doesn't start the rest of the programs which are doing
> > the real work. I really can't say why. Can you please attach the
> > logs from this node?
> 
> A pstree(1) on the better node visualizes the responsibility of
> starting the programs pretty vividly:
> 
>   |-heartbeat,18449
>   |   |-attrd,18477
>   |   |-ccm,18473
>   |   |-cib,18474
>   |   |-crmd,18478
>   |   |   |-pengine,18505
>   |   |   `-tengine,18504
>   |   |-heartbeat,18452
>   |   |-heartbeat,18453
>   |   |-heartbeat,18454
>   |   |-heartbeat,18455
>   |   |-heartbeat,18456
>   |   |-lrmd,18475 -r
>   |   |-mgmtd,18479 -v
>   |   `-stonithd,18476
> 
> Here they are again (from tonight):
> 
>       1 heartbeat[17481]: 2007/11/29_07:12:40 WARN: heartbeat: udp
> port 695 reserved for service "ieee-mms-ssl".
>       2 heartbeat[17481]: 2007/11/29_07:12:40 info: Version 2 support: yes
>       3 heartbeat[17481]: 2007/11/29_07:12:40 WARN: File
> /etc/ha.d/haresources exists.
>       4 heartbeat[17481]: 2007/11/29_07:12:40 WARN: This file is not
> used because crm is enabled
>       5 heartbeat[17481]: 2007/11/29_07:12:40 WARN: Logging daemon is
> disabled --enabling logging daemon is recommended
>       6 heartbeat[17481]: 2007/11/29_07:12:40 info: **************************
>       7 heartbeat[17481]: 2007/11/29_07:12:40 info: Configuration
> validated. Starting heartbeat 2.1.2
>       8 heartbeat[17482]: 2007/11/29_07:12:40 info: heartbeat: version 2.1.2
>       9 heartbeat[17482]: 2007/11/29_07:12:40 info: Heartbeat
> generation: 1196102397
>      10 heartbeat[17482]: 2007/11/29_07:12:40 info:
> G_main_add_TriggerHandler: Added signal manual handler
>      11 heartbeat[17482]: 2007/11/29_07:12:40 info:
> G_main_add_TriggerHandler: Added signal manual handler
>      12 heartbeat[17482]: 2007/11/29_07:12:40 info: Removing
> /var/run/heartbeat/rsctmp failed, recreating.
>      13 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: write
> socket priority set to IPTOS_LOWDELAY on eth0
>      14 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: bound
> send socket to device: eth0
>      15 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: bound
> receive socket to device: eth0
>      16 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast:
> started on port 695 interface eth0 to 192.168.0.248
>      17 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: write
> socket priority set to IPTOS_LOWDELAY on eth0
>      18 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: bound
> send socket to device: eth0
>      19 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast: bound
> receive socket to device: eth0
>      20 heartbeat[17482]: 2007/11/29_07:12:40 info: glib: ucast:
> started on port 695 interface eth0 to 192.168.0.249
>      21 heartbeat[17482]: 2007/11/29_07:12:40 info:
> G_main_add_SignalHandler: Added signal handler for signal 17
>      22 heartbeat[17482]: 2007/11/29_07:12:40 info: Local status now
> set to: 'up'
>      23 heartbeat[17482]: 2007/11/29_07:12:41 info: Link
> drbd01.test.spammatters.local:eth0 up.
>      24 heartbeat[17482]: 2007/11/29_07:12:41 info: Status update for
> node drbd01.test.spammatters.local: status up
>      25 heartbeat[17482]: 2007/11/29_07:13:45 info: all clients are now paused
>      26 heartbeat[17482]: 2007/11/29_07:13:45 debug: hist->ackseq =0
>      27 heartbeat[17482]: 2007/11/29_07:13:45 debug: hist->lowseq =0,
> hist->hiseq=101
>      28 heartbeat[17482]: 2007/11/29_07:13:45 debug: expecting from
> drbd01.test.spammatters.local
>      29 heartbeat[17482]: 2007/11/29_07:13:45 debug: it's ackseq=0

heartbeat is getting no packet acknowledgements from drbd01. It
must be a communication problem. Looks like drbd02 doesn't see
packets coming from drbd01, assuming that it's sending them,
which it does if there are no errors reported in drbd01.

Thanks,

Dejan

>      30 heartbeat[17482]: 2007/11/29_07:13:45 debug:
> 
> (The line numbers might come handy in discussing them).
> 
> The last five "debug:" lines repeat ad-infinitum.
> 
> Thanks very much.
> 
> --Amos
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to