On 03/24/2011 02:29 PM, Bill Bogstad wrote: > On Thu, Mar 24, 2011 at 10:25 AM, Jerry Feldman <[email protected]> wrote: >> I have a system where the NICs tend to go offline every few days >> (probably a couple of weeks). I've been looking at the logs for a >> possible indication of problems, buit I'm sure it's a motherboard issue. >> One thing I'm seeing in the logs is when the NICs fail (I have both NICs >> with IP addresses to see if one NIC fails and the other stays up) but >> both fail simultaneously. >> >> The relevant log entries are below. The first at 00:01:41 indicates the >> failure, but the second one 6 minutes later indicates a successful NTP >> sync. The next 2 log entries just confirm the NICs have failed. I have a >> script running on that box to give me some additional info, but it did >> not give me what I want. Note that I have VMWare server 2.0 running on >> this box, but we are planning to move VMWare off to another dedicated >> machine that is on order. My script is just a simple script that does a >> ping and logs success or failure. Rather than fill up the logs, the >> script edits the log with the intent I want to know the time of the >> first and most recent failure. >> >> Mar 24 00:01:41 boslc06 automount[4263]: host bosnas2: lookup failure 2 >> Mar 24 00:07:02 boslc06 ntpd[4465]: synchronized to 64.73.32.134, stratum 2 >> Mar 24 00:45:06 boslc06 automount[4263]: set_tsd_user_vars: failed to >> get passwd info from getpwuid_r >> Mar 24 00:45:37 boslc06 automount[4263]: host bosnas2: lookup failure 2 > Not clear if you want to investigate this further, but you might try modifying > your ping script to gather more information when a failure occurs. Perhaps > a "arp -an" to see what is in the ARP cache. "tcpdump/tshark -w" to capture > any packets that are traversing that interface. If you use a "-c", > you can limit the > number of packets saved so you won't fill up the disk. This might > tell you if the > failure is in both directions or in just one. Use > mii-diag/mii-tool/ethtool to capture the > state of Ethernet speed/duplex negotiation from the perspective of the host. > You don't report any errors from the kernel about actual interface > errors which is a bit > odd. That implies the kernel thought it was successful on outgoing packets. > Try running "dmesg" as well from your script to check on this. > > Good Luck, > Bill Bogstad > Thanks for the ideas. I've added dmesg to the script on the first instance of a failure.
-- Jerry Feldman <[email protected]> Boston Linux and Unix PGP key id: 537C5846 PGP Key fingerprint: 3D1B 8377 A3C0 A5F2 ECBB CA3B 4607 4319 537C 5846
_______________________________________________ Discuss mailing list [email protected] http://lists.blu.org/mailman/listinfo/discuss
