A-HA!
I see on node2:
# netstat -lnup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
udp 0 0 0.0.0.0:694 0.0.0.0:*
2214/rpc.statd
What is statd and why is it running using the port I need for heartbeat?
statd on node1 is running on 690 and 693.
Heartbeat is on 694:
udp 0 0 0.0.0.0:694 0.0.0.0:*
4581/heartbeat: wri
Cameron
On Wed, Mar 17, 2010 at 1:57 PM, Cameron Smith <[email protected]>wrote:
> Thanks Andrew!
>
> Yes I see that but the two servers are identically configured with the only
> difference being IP, hostname so why am I not getting that error on node1?
> Which address is the error referring to? The main shared IP? I don't
> understand how to troubleshoot from that error.
>
> Cameron
>
>
>
> On Wed, Mar 17, 2010 at 1:47 PM, Andrew Beekhof <[email protected]>wrote:
>
>> I wonder if this might be related:
>>
>> Mar 17 21:46:50 node2 heartbeat: [5289]: ERROR: glib: Error binding socket
>> (Address already in use). Retrying.
>>
>>
>> On Wed, Mar 17, 2010 at 9:44 PM, Cameron Smith <[email protected]>
>> wrote:
>> > Here is more info:
>> >
>> > In checking /var/log/messages:
>> >
>> > Mar 17 21:46:50 node2 heartbeat: [5288]: info: Version 2 support: false
>> > Mar 17 21:46:50 node2 heartbeat: [5288]: WARN: Logging daemon is
>> disabled
>> > --enabling logging daemon is recommended
>> > Mar 17 21:46:50 node2 heartbeat: [5288]: info:
>> **************************
>> > Mar 17 21:46:50 node2 heartbeat: [5288]: info: Configuration validated.
>> > Starting heartbeat 2.1.3
>> > Mar 17 21:46:50 node2 heartbeat: [5289]: info: heartbeat: version 2.1.3
>> > Mar 17 21:46:50 node2 heartbeat: [5289]: info: Heartbeat generation:
>> > 1268877071
>> > Mar 17 21:46:50 node2 heartbeat: [5289]: ERROR: glib: Error binding
>> socket
>> > (Address already in use). Retrying.
>> > Mar 17 21:46:51 node2 rpc.statd[2214]: recv_rply: can't decode RPC
>> message!
>> > Mar 17 21:46:51 node2 heartbeat: [5289]: ERROR: glib: Error binding
>> socket
>> > (Address already in use). Retrying.
>> > Mar 17 21:46:52 node2 heartbeat: [5289]: ERROR: glib: Error binding
>> socket
>> > (Address already in use). Retrying.
>> > Mar 17 21:46:53 node2 rpc.statd[2214]: recv_rply: can't decode RPC
>> message!
>> > Mar 17 21:46:53 node2 heartbeat: [5289]: ERROR: glib: Error binding
>> socket
>> > (Address already in use). Retrying.
>> > Mar 17 21:46:54 node2 heartbeat: [5289]: ERROR: glib: Error binding
>> socket
>> > (Address already in use). Retrying.
>> > Mar 17 21:46:55 node2 rpc.statd[2214]: recv_rply: can't decode RPC
>> message!
>> > Mar 17 21:46:55 node2 heartbeat: [5289]: ERROR: glib: Error binding
>> socket
>> > (Address already in use). Retrying.
>> > Mar 17 21:46:56 node2 heartbeat: [5289]: ERROR: glib: Error binding
>> socket
>> > (Address already in use). Retrying.
>> > Mar 17 21:46:57 node2 rpc.statd[2214]: recv_rply: can't decode RPC
>> message!
>> > Mar 17 21:46:57 node2 rpc.statd[2214]: recv_rply: can't decode RPC
>> message!
>> > Mar 17 21:46:57 node2 heartbeat: [5289]: ERROR: glib: Error binding
>> socket
>> > (Address already in use). Retrying.
>> > Mar 17 21:46:58 node2 heartbeat: [5289]: ERROR: glib: Error binding
>> socket
>> > (Address already in use). Retrying.
>> > Mar 17 21:46:59 node2 rpc.statd[2214]: recv_rply: can't decode RPC
>> message!
>> > Mar 17 21:46:59 node2 heartbeat: [5289]: ERROR: glib: Error binding
>> socket
>> > (Address already in use). Retrying.
>> > Mar 17 21:47:00 node2 heartbeat: [5289]: ERROR: glib: Unable to bind
>> socket
>> > (Address already in use). Giving up.
>> > Mar 17 21:47:00 node2 heartbeat: [5289]: info: glib: UDP Broadcast
>> heartbeat
>> > closed on port 694 interface eth0 - Status: 1
>> > Mar 17 21:47:00 node2 heartbeat: [5289]: ERROR: make_io_childpair:
>> cannot
>> > open bcast eth0
>> > Mar 17 21:47:01 node2 rpc.statd[2214]: recv_rply: can't decode RPC
>> message!
>> > Mar 17 21:47:01 node2 heartbeat: [5292]: CRIT: Emergency Shutdown:
>> Master
>> > Control process died.
>> > Mar 17 21:47:01 node2 heartbeat: [5292]: CRIT: Killing pid 5289 with
>> SIGTERM
>> > Mar 17 21:47:01 node2 heartbeat: [5292]: CRIT: Emergency Shutdown(MCP
>> dead):
>> > Killing ourselves.
>> > Mar 17 21:47:03 node2 rpc.statd[2214]: recv_rply: can't decode RPC
>> message!
>> >
>> > Why is it doing that???
>> >
>> > Thanks!
>> > Cameron
>> >
>> > On Wed, Mar 17, 2010 at 1:03 PM, Cameron Smith <[email protected]
>> >wrote:
>> >
>> >> Brand new heartbeat user running my first test and have run into a
>> problem.
>> >>
>> >> I installed heartbeat on node1 and node2
>> >>
>> >> it works fine on node1 and shows a webpage to the eth0:0 IP but
>> heartbeat
>> >> won't stay running on node2.
>> >>
>> >> When I start heartbeat I get:
>> >> # service heartbeat start
>> >> logd is already running
>> >> Starting High-Availability services:
>> >> 2010/03/17_21:04:44 INFO: Resource is stopped
>> >> [ OK ]
>> >>
>> >>
>> >> and then on a status I get:
>> >> heartbeat OK [pid 5093 et al] is running on node2.example.net [
>> >> node2.example.net]
>> >>
>> >> but then after a short amount of time the status check will return:
>> >> # service heartbeat status
>> >> heartbeat is stopped. No process
>> >>
>> >> Why is this happening?
>> >>
>> >> I followed these instructions:
>> >> http://www.howtoforge.com/high_availability_heartbeat_centos
>> >>
>> >> I am running heartbeat 2.1.3
>> >>
>> >> Thanks!
>> >> Cameron
>> >>
>> >>
>> > _______________________________________________
>> > Linux-HA mailing list
>> > [email protected]
>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> > See also: http://linux-ha.org/ReportingProblems
>> >
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems