Hi, On Wed, Jun 10, 2009 at 12:20:14PM +0200, jeroen groenewegen van der weyden wrote: > Hi everybody, > > I just experienced a strange behavior, after rebooting our server manual > the heart beat came not into service after the reboot. The message log > show Retrying already in use? but in netstat nothing shows up on port
Did you try lsof or fuser? > 694? The nodes were able to see each other. On both nodes services were > connecting using the same link (br0). > > A heartbeart stop/start did not help and resulted in the same log messages > After the a second reboot the phenomenon was gone > > heartbeat V2.99.2 > openSUSE 11.1 > > Anybody seen this before? or know the cause of it? No. The only explanation I can imagine is that another process is using this port. Thanks, Dejan > > best regards > > jeroen > > ====== log ========= > ClusterNode1:/ # tail /var/log/messages > Jun 10 12:00:08 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast: > error binding socket. Retrying: Address already in use > Jun 10 12:00:09 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast: > error binding socket. Retrying: Address already in use > Jun 10 12:00:10 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast: > error binding socket. Retrying: Address already in use > Jun 10 12:00:11 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast: > error binding socket. Retrying: Address already in use > Jun 10 12:00:12 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast: > error binding socket. Retrying: Address already in use > Jun 10 12:00:13 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast: > unable to bind socket. Giving up: Address already in use > Jun 10 12:00:13 ClusterNode1 heartbeat: [5315]: ERROR: > make_io_childpair: cannot open ucast br0 > Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Emergency > Shutdown: Master Control process died. > Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Killing pid 5315 > with SIGTERM > Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Emergency > Shutdown(MCP dead): Killing ourselves. > > > ========= netstat -ntlp ============ > > ClusterNode1:/ # netstat -ntlp > Active Internet connections (only servers) > Proto Recv-Q Send-Q Local Address Foreign Address > State PID/Program name > tcp 0 0 0.0.0.0:5801 0.0.0.0:* > LISTEN 4039/xinetd > tcp 0 0 0.0.0.0:5901 0.0.0.0:* > LISTEN 4039/xinetd > tcp 0 0 0.0.0.0:111 0.0.0.0:* > LISTEN 3063/rpcbind > tcp 0 0 0.0.0.0:6004 0.0.0.0:* > LISTEN 4823/Xvnc > tcp 0 0 0.0.0.0:22 0.0.0.0:* > LISTEN 3907/sshd > tcp 0 0 127.0.0.1:631 0.0.0.0:* > LISTEN 3841/cupsd > tcp 0 0 127.0.0.1:25 0.0.0.0:* > LISTEN 3868/master > tcp 0 0 :::111 :::* > LISTEN 3063/rpcbind > tcp 0 0 :::6004 :::* > LISTEN 4823/Xvnc > tcp 0 0 :::22 :::* > LISTEN 3907/sshd > > > ======= ha.cf ========== > > use_logd yes > ucast br0 192.168.1.1 > ucast br0 192.168.1.2 > ucast br1 172.27.74.136 > ucast br1 172.27.74.137 > #serial /dev/ttyS0 > node ClusterNode1 > node ClusterNode2 > respawn root /usr/lib64/heartbeat/hbagent > apiauth mgmtd uid=root > respawn root /usr/lib64/heartbeat/mgmtd -v > crm on > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
