Re: [Linux-HA] heartbeat 2.0.8: causing nfs kernel oops

Gerry Reno Tue, 01 May 2007 10:49:33 -0700

Alan Robertson wrote:

Gerry Reno wrote:

I'm seeing some very strange things lately.  Whenever heartbeat is
running there are these messages in the log:
Apr 30 19:38:08 grp-01-30-01 heartbeat: [2533]: ERROR: write failure on
bcast eth0.: No such device
Apr 30 19:38:08 grp-01-30-01 heartbeat: [2533]: ERROR: glib: Unable to
send bcast [-1] packet(len=214): No such device
Apr 30 19:38:08 grp-01-30-01 heartbeat: [2533]: ERROR: MSG: Dumping
message with 10 fields
Apr 30 19:38:08 grp-01-30-01 heartbeat: [2533]: ERROR: MSG[0] :
[t=NS_ackmsg]
Apr 30 19:38:08 grp-01-30-01 heartbeat: [2533]: ERROR: MSG[1] :
[dest=grp-01-30-02]
Apr 30 19:38:08 grp-01-30-01 heartbeat: [2533]: ERROR: MSG[2] :
[ackseq=40cd2]
Apr 30 19:38:08 grp-01-30-01 heartbeat: [2533]: ERROR: MSG[3] :
[(1)destuuid=0x835cfc8(37 28)]
Apr 30 19:38:08 grp-01-30-01 heartbeat: [2533]: ERROR: MSG[4] :
[src=grp-01-30-01]
Apr 30 19:38:08 grp-01-30-01 heartbeat: [2533]: ERROR: MSG[5] :
[(1)srcuuid=0x8361848(36 27)]
Apr 30 19:38:08 grp-01-30-01 heartbeat: [2533]: ERROR: MSG[6] : [hg=a1]
Apr 30 19:38:08 grp-01-30-01 heartbeat: [2533]: ERROR: MSG[7] :
[ts=46367de0]
Apr 30 19:38:08 grp-01-30-01 heartbeat: [2533]: ERROR: MSG[8] : [ttl=4]
Apr 30 19:38:08 grp-01-30-01 heartbeat: [2533]: ERROR: MSG[9] : [auth=1
dcf0feb393f46354b060306713eb72adc15eecf3]

But yet, in most other respects eth0 seems to behave perfectly normal.I even went so far as to swap out the NIC card for eth0 and same

result.  I can ping, ftp, ssh, etc. using eth0 with no problems.  Where
I do see a problem is with using NFS.  If I mount a remote NFS mount and
try to push a compressed tar to the NFS mounted directory, after about
1GB of transfer I get a kernel oops in the NFS code.  Now, if I shutdown
heartbeat and perform the same compressed tar it completes correctly
without any oops.  So I'm baffled by this.  Is there any known problem
that would cause the above log messages on an otherwise perfectly good

network connection and also cause some type of interaction with NFS?This problem seems to follow the primary node. In other words the

lockup occurs on whichever node has the primary IPaddr.  I can post the
log, but it's hundreds of megabytes of this same message.


Yes.

Running DHCP on a network link.  Taking the link down manually.  Other
things that involve messing around with eth0.

Alan,

Where do you think this problem lies? Is it a kernel problem; aheartbeat problem? Is this something that is/has been/can be addressedby the heartbeat team? Is there a workaround/fix? This problem greatlyinterferes with other network activities that need to take place on ourservers such as backups and that is how I discovered it because none ofthe backups were completing overnight and the whole machine would belocked up due to the kernel oops.


thx,
-Gerry


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] heartbeat 2.0.8: causing nfs kernel oops

Reply via email to