On 03/24/2011 02:29 PM, Bill Bogstad wrote:
> On Thu, Mar 24, 2011 at 10:25 AM, Jerry Feldman <[email protected]> wrote:
>> I have a system where the NICs tend to go offline every few days
>> (probably a couple of weeks). I've been looking at the logs for a
>> possible indication of problems, buit I'm sure it's a motherboard issue.
>> One thing I'm seeing in the logs is when the NICs fail (I have both NICs
>> with IP addresses to see if one NIC fails and the other stays up) but
>> both fail simultaneously.
>>
>> The relevant log entries are below. The first at 00:01:41 indicates the
>> failure, but the second one 6 minutes later indicates a successful NTP
>> sync. The next 2 log entries just confirm the NICs have failed. I have a
>> script running on that box to give me some additional info, but it did
>> not give me what I want. Note that I have VMWare server 2.0 running on
>> this box, but we are planning to move VMWare off to another dedicated
>> machine that is on order. My script is just a simple script that does a
>> ping and logs success or failure. Rather than fill up the logs, the
>> script edits the log with the intent I want to know the time of the
>> first and most recent failure.
>>
>> Mar 24 00:01:41 boslc06 automount[4263]: host bosnas2: lookup failure 2
>> Mar 24 00:07:02 boslc06 ntpd[4465]: synchronized to 64.73.32.134, stratum 2
>> Mar 24 00:45:06 boslc06 automount[4263]: set_tsd_user_vars: failed to
>> get passwd info from getpwuid_r
>> Mar 24 00:45:37 boslc06 automount[4263]: host bosnas2: lookup failure 2
> Not clear if you want to investigate this further, but you might try modifying
> your ping script to gather more information when a failure occurs.  Perhaps
> a "arp -an" to see what is in the ARP cache.  "tcpdump/tshark -w" to capture
> any packets that are traversing that interface.  If you use a "-c",
> you can limit the
> number of packets saved so you won't fill up the disk.  This might
> tell you if the
> failure is in both directions or in just one.   Use
> mii-diag/mii-tool/ethtool to capture the
> state of Ethernet speed/duplex negotiation from the perspective of the host.
> You don't report any errors from the kernel about actual interface
> errors which is a bit
> odd.  That implies the kernel thought it was successful on outgoing packets.
> Try running "dmesg" as well from your script to check on this.
>
> Good Luck,
> Bill Bogstad
>
Thanks for the ideas. I've added dmesg to the script on the first
instance of a failure.

-- 
Jerry Feldman <[email protected]>
Boston Linux and Unix
PGP key id: 537C5846
PGP Key fingerprint: 3D1B 8377 A3C0 A5F2 ECBB  CA3B 4607 4319 537C 5846


_______________________________________________
Discuss mailing list
[email protected]
http://lists.blu.org/mailman/listinfo/discuss

Reply via email to