On Pén, December 12, 2008 19:09, David Sommerseth wrote:
>
>
> David Sommerseth wrote:
>> [email protected] wrote:
>>> PCI-X dual port Broadcom NetXtreme BCM5704 Gigabit Ethernet (rev 03)
>>> adapter is working fine here driven by tg3, 2.6.27-hardened-r1. The
>>> driver
>>> doesn't seem to be borked with my card.
>>>
>>> Did you check out the "error" field of ifconfig's output for the
>>> interface
>>> of your card?
>>>
>>> Regards,
>>> Dw.
>>
>> Hmmm ... No, I have not had that opportunity.  The server is located
>> 2000km away from me, and I
>> usually call a guy (who is not a technician)to go in and press
>> CTRL-ALT-DEL on a keyboard.  That is
>> the short-time "fix".  But I'm going to have a look physically on the
>> server in a couple of weeks,
>> so if I get positive feedbacks from others as well regarding 2.6.27
>> kernel, I'm willing to try that
>> upgrade.
>>
>> This interface is an on-board interface in an IBM eServer.  The first
>> time it happened, it was no
>> problems for about 28 days.  Now it was 13 days.  So I expect it to
>> happen again, soon enough.
>>
>> I'll try to hack the shutdown scripts to dump the ifconfig info
>> somewhere somehow.
>
> Then it happened again ... and I have ifconfig stats for the interface:
>
> eth0      Link encap:Ethernet  HWaddr 00:14:5e:5d:3c:d0
>            inet6 addr: fe80::214:5eff:fe5d:3cd0/64 Scope:Link
>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>            RX packets:10551633 errors:4294967239 dropped:767 overruns:0
> frame:170
>            TX packets:9371606 errors:4294967239 dropped:0 overruns:0
> carrier:0
>            collisions:4294967239 txqueuelen:1000
>            RX bytes:28237000 (26.9 MiB)  TX bytes:163377979 (155.8 MiB)
>            Interrupt:16
>
>  From the kernel log I see this:
>
> Dec 12 12:19:21 fw [74355.059369] tg3: tg3_abort_hw timed out for world,
> TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
> Dec 12 12:19:24 fw [74357.842979] tg3: world: No firmware running.
> Dec 12 12:19:41 fw [74374.992867] tg3: world: Link is down.
>
> I'm surprised by the errors and collision numbers here, as I checked it
> the
> other day, and all of them was 0.  I also know that the TX and RX values
> was above 3-4GB, but don't remember which was what.
>
> Could this be an overflow bug of some kind?
>
> I have also found out that IBM have released an updated firmware to this
> network device, so I'll try to upgrade it during Christmas when I'm close
> to the box again.  In the mean time I have a little ping-script, which
> restarts network (incl. reloading of the tg3 module) when the network
> dies.
>   This restart gives me minimal downtime.
>
> But I do not understand why this box was so rock solid until I upgraded
> from 2.6.22-hardened-r8 to 2.6.25-hardened-r8.  The new kernel driver
> obviously does something it didn't do before.  Unfortunately I can't find
> anything particular in the kernel git logs for the tg3.[ch] files which
> could pin-point anything particular.
>
>
> Does anyone have any experiences regarding firmware upgrades on these
> cards?  The instructions seems pretty much forward, but if you know about
> anything, whatever, I would appreciate that.
>
>
> kind regards,
>
> David Sommerseth
>

Rather strange. The collisions and the errors counter shows the same...
It was a long time ago, when I last saw collisions.

There are several possibilities regarding this symptom. It would be
important to know if the card is connected to a hub, or a switch(ing-hub)?
1.) There can be a defective device on the subnet, which is connected to
it from time-to-time, or it is present all the time, but doesn't hog the
line constantly
2.) The switch/hub can have a problem - try reconnecting the card to
another port
3.) The network card can have a problem, which can be software related and
might be solved by a firmware upgrade (unfortunately the card itself
cannot be replaced being an on-board NIC)
4.) It can even be caused by a driver bug - which we know is all the way
possible since the e1000 issue

I hope it'll turn out soon. I would think about a hardware issue, but it's
a disturbing fact, that these symptoms appeared after a kernel upgrade.

Here's my ifconfig output for reference:
bond0     Link encap:Ethernet  HWaddr 00:10:18:06:ce:24
          inet addr:195.111.75.211  Bcast:195.111.75.255 
Mask:255.255.255.192
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:9285671 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1681056 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2100416838 (1.9 GiB)  TX bytes:1298939064 (1.2 GiB)

eth0      Link encap:Ethernet  HWaddr 00:10:18:06:ce:24
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:5395008 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1681040 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1529378855 (1.4 GiB)  TX bytes:1298937508 (1.2 GiB)
          Interrupt:20

eth1      Link encap:Ethernet  HWaddr 00:10:18:06:ce:24
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:3890663 errors:0 dropped:0 overruns:0 frame:0
          TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:571037983 (544.5 MiB)  TX bytes:1556 (1.5 KiB)
          Interrupt:21

lspci:
00:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704
Gigabit Ethernet (rev 03)
00:08.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704
Gigabit Ethernet (rev 03)

Regards,
Dw.
-- 
dr Tóth Attila, Radiológus, 06-20-825-8057, 06-30-5962-962
Attila Toth MD, Radiologist, +36-20-825-8057, +36-30-5962-962


Reply via email to