Hi,

On Thu, Nov 29, 2007 at 06:20:21PM +1100, Amos Shapira wrote:
> On 29/11/2007, Amos Shapira <[EMAIL PROTECTED]> wrote:
> > I might try to just purge everything, clear the /var directories and
> > start all over again.
> 
> I went ahead and removed all files under /var/lib/heartbeat on both
> machines and got back to the basics as given in
> http://wiki.centos.org/HowTos/Ha-Drbd and still get the same result -
> drbd01 runs, more or less, but drbd02's crm_mon doesn't connect.
> 
> I compared the installed packages on both machines and they are identical.
> 
> One thing that I noticed is that there are many more processes started
> on drbd01 (the "relatively good node") than on drbd02:
> 
> drbd01 processes:
> 
> root     18427     1  0 05:05 pts/0    00:00:00 ha_logd: read process
> root     18428 18427  0 05:05 pts/0    00:00:00 ha_logd: write process
> root     18449     1  0 05:05 ?        00:00:00 heartbeat: master
> control procesnobody   18452 18449  0 05:05 ?        00:00:00
> heartbeat: FIFO reader
> nobody   18453 18449  0 05:05 ?        00:00:00 heartbeat: write: ucast eth0
> nobody   18454 18449  0 05:05 ?        00:00:00 heartbeat: read: ucast eth0
> nobody   18455 18449  0 05:05 ?        00:00:00 heartbeat: write: ucast eth0
> nobody   18456 18449  0 05:05 ?        00:00:00 heartbeat: read: ucast eth0
> 90       18473 18449  0 05:07 ?        00:00:00 /usr/lib64/heartbeat/ccm
> 90       18474 18449  0 05:07 ?        00:00:00 /usr/lib64/heartbeat/cib
> root     18475 18449  0 05:07 ?        00:00:00 /usr/lib64/heartbeat/lrmd -r
> nobody   18476 18449  0 05:07 ?        00:00:00 /usr/lib64/heartbeat/stonithd
> 90       18477 18449  0 05:07 ?        00:00:00 /usr/lib64/heartbeat/attrd
> 90       18478 18449  0 05:07 ?        00:00:00 /usr/lib64/heartbeat/crmd
> root     18479 18449  0 05:07 ?        00:00:00 /usr/lib64/heartbeat/mgmtd -v
> 90       18504 18478  0 05:09 ?        00:00:00 /usr/lib64/heartbeat/tengine
> 90       18505 18478  0 05:09 ?        00:00:00 /usr/lib64/heartbeat/pengine

This looks OK.

> drbd02 processes:
> 
> root      2361     1  0 05:06 pts/1    00:00:00 ha_logd: read process
> root      2362  2361  0 05:06 pts/1    00:00:00 ha_logd: write process
> root      2383     1  0 05:06 ?        00:00:00 heartbeat: master
> control procesnobody    2386  2383  0 05:06 ?        00:00:00
> heartbeat: FIFO reader
> nobody    2387  2383  0 05:06 ?        00:00:00 heartbeat: write: ucast eth0
> nobody    2388  2383  0 05:06 ?        00:00:00 heartbeat: read: ucast eth0
> nobody    2389  2383  0 05:06 ?        00:00:00 heartbeat: write: ucast eth0
> nobody    2390  2383  0 05:06 ?        00:00:00 heartbeat: read: ucast eth0
> 
> Isn't it significant that programs like "crmd" are not running on drbd02?

Yes, very much so. For some reason the MCP (master control
process) doesn't start the rest of the programs which are doing
the real work. I really can't say why. Can you please attach the
logs from this node?

Thanks,

Dejan

> 
> Thanks,
> 
> --Amos
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to