Hi, On Thu, Nov 29, 2007 at 06:20:21PM +1100, Amos Shapira wrote: > On 29/11/2007, Amos Shapira <[EMAIL PROTECTED]> wrote: > > I might try to just purge everything, clear the /var directories and > > start all over again. > > I went ahead and removed all files under /var/lib/heartbeat on both > machines and got back to the basics as given in > http://wiki.centos.org/HowTos/Ha-Drbd and still get the same result - > drbd01 runs, more or less, but drbd02's crm_mon doesn't connect. > > I compared the installed packages on both machines and they are identical. > > One thing that I noticed is that there are many more processes started > on drbd01 (the "relatively good node") than on drbd02: > > drbd01 processes: > > root 18427 1 0 05:05 pts/0 00:00:00 ha_logd: read process > root 18428 18427 0 05:05 pts/0 00:00:00 ha_logd: write process > root 18449 1 0 05:05 ? 00:00:00 heartbeat: master > control procesnobody 18452 18449 0 05:05 ? 00:00:00 > heartbeat: FIFO reader > nobody 18453 18449 0 05:05 ? 00:00:00 heartbeat: write: ucast eth0 > nobody 18454 18449 0 05:05 ? 00:00:00 heartbeat: read: ucast eth0 > nobody 18455 18449 0 05:05 ? 00:00:00 heartbeat: write: ucast eth0 > nobody 18456 18449 0 05:05 ? 00:00:00 heartbeat: read: ucast eth0 > 90 18473 18449 0 05:07 ? 00:00:00 /usr/lib64/heartbeat/ccm > 90 18474 18449 0 05:07 ? 00:00:00 /usr/lib64/heartbeat/cib > root 18475 18449 0 05:07 ? 00:00:00 /usr/lib64/heartbeat/lrmd -r > nobody 18476 18449 0 05:07 ? 00:00:00 /usr/lib64/heartbeat/stonithd > 90 18477 18449 0 05:07 ? 00:00:00 /usr/lib64/heartbeat/attrd > 90 18478 18449 0 05:07 ? 00:00:00 /usr/lib64/heartbeat/crmd > root 18479 18449 0 05:07 ? 00:00:00 /usr/lib64/heartbeat/mgmtd -v > 90 18504 18478 0 05:09 ? 00:00:00 /usr/lib64/heartbeat/tengine > 90 18505 18478 0 05:09 ? 00:00:00 /usr/lib64/heartbeat/pengine
This looks OK. > drbd02 processes: > > root 2361 1 0 05:06 pts/1 00:00:00 ha_logd: read process > root 2362 2361 0 05:06 pts/1 00:00:00 ha_logd: write process > root 2383 1 0 05:06 ? 00:00:00 heartbeat: master > control procesnobody 2386 2383 0 05:06 ? 00:00:00 > heartbeat: FIFO reader > nobody 2387 2383 0 05:06 ? 00:00:00 heartbeat: write: ucast eth0 > nobody 2388 2383 0 05:06 ? 00:00:00 heartbeat: read: ucast eth0 > nobody 2389 2383 0 05:06 ? 00:00:00 heartbeat: write: ucast eth0 > nobody 2390 2383 0 05:06 ? 00:00:00 heartbeat: read: ucast eth0 > > Isn't it significant that programs like "crmd" are not running on drbd02? Yes, very much so. For some reason the MCP (master control process) doesn't start the rest of the programs which are doing the real work. I really can't say why. Can you please attach the logs from this node? Thanks, Dejan > > Thanks, > > --Amos > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
