Patrick von der Hagen wrote:
> Hi all,
>
> I'm installing a new cluster based on RHEL5. After compiling the
> source-RPMs and installing heartbeat setup was quite easy and I soon had
> a cluster of 3 nodes up and running.
>
> However, when restarting a server heartbeat did not come up properly.
>
> It looks like heartbeat starting up, xen reconfiguring the
> network-devices (the xen-init-scripts run after heartbeat) and hearbeat
> being lost.
> Apr 16 17:16:14 mailin1 heartbeat: [2460]: ERROR: glib: Unable to send
> [-1] ucast packet: No such device
> Apr 16 17:16:14 mailin1 heartbeat: [2460]: ERROR: write failure on ucast
> eth0.: No such device
>
> I'm not sure if this is actually a real heartbeat-problem. Other daemons
> like sshd are started before hearbeat, so they should (in theory) suffer
> the same problems, but seem to be completely uneffected. Even if this is
> not considered to be a heartbeat-problem, I thought I should mention it
> here because I expect others to hit the same issue.
>
> I could have tried to shuffle the init-scripts around to have heartbeat
> running with xen properly, but have chosen to run my servers with
> non-xen-kernels instead.
>
> Haven't seen any problems in non-xen-mode yet.
> Current kernel is 2.6.18-8.1.1.el5, was 2.6.18-8.el5xen before.
>
>
>
>
>
>
> Apr 16 17:16:09 mailin1 logd: [2427]: info: logd started with default
> configuration.
> Apr 16 17:16:09 mailin1 logd: [2433]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> Apr 16 17:16:09 mailin1 logd: [2427]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> Apr 16 17:16:09 mailin1 heartbeat: [2448]: info: No log entry found in
> ha.cf -- use logd
> Apr 16 17:16:09 mailin1 heartbeat: [2448]: info: Enabling logging
> daemon
> Apr 16 17:16:09 mailin1 heartbeat: [2448]: info: logfile and debug file
> are those specified in logd config file (default /etc/logd.cf)
> Apr 16 17:16:09 mailin1 heartbeat: [2448]: info:
> **************************
> Apr 16 17:16:09 mailin1 heartbeat: [2448]: info: Configuration
> validated. Starting heartbeat 2.0.8
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info: heartbeat: version
> 2.0.8
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info: Heartbeat generation:
> 10
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info:
> G_main_add_TriggerHandler: Added signal manual handler
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info:
> G_main_add_TriggerHandler: Added signal manual handler
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info:
> Removing /var/run/heartbeat/rsctmp failed, recreating.
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info: glib: ucast: write
> socket priority set to IPTOS_LOWDELAY on eth0
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info: glib: ucast: bound send
> socket to device: eth0
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info: glib: ucast: bound
> receive socket to device: eth0
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info: glib: ucast: started on
> port 694 interface eth0 to 129.13.185.82
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info: glib: ucast: write
> socket priority set to IPTOS_LOWDELAY on eth0
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info: glib: ucast: bound send
> socket to device: eth0
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info: glib: ucast: bound
> receive socket to device: eth0
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info: glib: ucast: started on
> port 694 interface eth0 to 129.13.185.83
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info:
> G_main_add_SignalHandler: Added signal handler for signal 17
> Apr 16 17:16:09 mailin1 heartbeat: [2449]: info: Local status now set
> to: 'up'
> Apr 16 17:16:09 mailin1 gpm[2472]: *** info [startup.c(95)]:
> Apr 16 17:16:09 mailin1 gpm[2472]: Started gpm successfully. Entered
> daemon mode.
> Apr 16 17:16:09 mailin1 rhnsd[2574]: Red Hat Network Services Daemon
> starting up.
> Apr 16 17:16:10 mailin1 heartbeat: [2449]: info: Link mailin2:eth0 up.
> Apr 16 17:16:10 mailin1 heartbeat: [2449]: info: Status update for node
> mailin2: status active
> Apr 16 17:16:10 mailin1 heartbeat: [2449]: info: Link mailin3:eth0 up.
> Apr 16 17:16:10 mailin1 heartbeat: [2449]: info: Status update for node
> mailin3: status active
> Apr 16 17:16:10 mailin1 heartbeat: [2449]: info: Comm_now_up(): updating
> status to active
> Apr 16 17:16:10 mailin1 heartbeat: [2449]: info: Local status now set
> to: 'active'
> [...]
> Apr 16 17:16:13 mailin1 xenstored: Checking store ...
> Apr 16 17:16:13 mailin1 xenstored: Checking store complete.
> Apr 16 17:16:13 mailin1 xenstored: Checking store ...
> Apr 16 17:16:13 mailin1 xenstored: Checking store complete.
> Apr 16 17:16:14 mailin1 kernel: Bridge firewalling registered
> Apr 16 17:16:14 mailin1 cib: [2592]: WARN: init_start: CCM Activation
> failed
> Apr 16 17:16:14 mailin1 cib: [2592]: WARN: init_start: CCM Connection
> failed 4 times (30 max)
> Apr 16 17:16:14 mailin1 kernel: device vif0.0 entered promiscuous mode
> Apr 16 17:16:14 mailin1 kernel: audit(1176736574.278:3): dev=vif0.0
> prom=256 old_prom=0 auid=4294967295
> Apr 16 17:16:14 mailin1 kernel: xenbr0: port 1(vif0.0) entering learning
> state
> Apr 16 17:16:14 mailin1 kernel: xenbr0: topology change detected,
> propagating
> Apr 16 17:16:14 mailin1 kernel: xenbr0: port 1(vif0.0) entering
> forwarding state
> Apr 16 17:16:14 mailin1 heartbeat: [2460]: ERROR: glib: Unable to send
> [-1] ucast packet: No such device
> Apr 16 17:16:14 mailin1 heartbeat: [2460]: ERROR: write failure on ucast
> eth0.: No such device
> Apr 16 17:16:14 mailin1 heartbeat: [2460]: ERROR: glib: Unable to send
> [-1] ucast packet: No such device
> Apr 16 17:16:14 mailin1 heartbeat: [2460]: ERROR: write failure on ucast
> eth0.: No such device
I've seen this happen when someone had DHCP running on the link.
We bind to that specific interface. We would be affected if it changed
or was renamed, etc.
Lots of other things wouldn't because they rely exclusively on routing.
--
Alan Robertson <[EMAIL PROTECTED]>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems