On Pén, December 12, 2008 19:09, David Sommerseth wrote:
>
>
> David Sommerseth wrote:
>> [email protected] wrote:
>>> PCI-X dual port Broadcom NetXtreme BCM5704 Gigabit Ethernet (rev 03)
>>> adapter is working fine here driven by tg3, 2.6.27-hardened-r1. The
>>> driver
>>> doesn't seem to be borked with my card.
>>>
>>> Did you check out the "error" field of ifconfig's output for the
>>> interface
>>> of your card?
>>>
>>> Regards,
>>> Dw.
>>
>> Hmmm ... No, I have not had that opportunity. The server is located
>> 2000km away from me, and I
>> usually call a guy (who is not a technician)to go in and press
>> CTRL-ALT-DEL on a keyboard. That is
>> the short-time "fix". But I'm going to have a look physically on the
>> server in a couple of weeks,
>> so if I get positive feedbacks from others as well regarding 2.6.27
>> kernel, I'm willing to try that
>> upgrade.
>>
>> This interface is an on-board interface in an IBM eServer. The first
>> time it happened, it was no
>> problems for about 28 days. Now it was 13 days. So I expect it to
>> happen again, soon enough.
>>
>> I'll try to hack the shutdown scripts to dump the ifconfig info
>> somewhere somehow.
>
> Then it happened again ... and I have ifconfig stats for the interface:
>
> eth0 Link encap:Ethernet HWaddr 00:14:5e:5d:3c:d0
> inet6 addr: fe80::214:5eff:fe5d:3cd0/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:10551633 errors:4294967239 dropped:767 overruns:0
> frame:170
> TX packets:9371606 errors:4294967239 dropped:0 overruns:0
> carrier:0
> collisions:4294967239 txqueuelen:1000
> RX bytes:28237000 (26.9 MiB) TX bytes:163377979 (155.8 MiB)
> Interrupt:16
>
> From the kernel log I see this:
>
> Dec 12 12:19:21 fw [74355.059369] tg3: tg3_abort_hw timed out for world,
> TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
> Dec 12 12:19:24 fw [74357.842979] tg3: world: No firmware running.
> Dec 12 12:19:41 fw [74374.992867] tg3: world: Link is down.
>
> I'm surprised by the errors and collision numbers here, as I checked it
> the
> other day, and all of them was 0. I also know that the TX and RX values
> was above 3-4GB, but don't remember which was what.
>
> Could this be an overflow bug of some kind?
>
> I have also found out that IBM have released an updated firmware to this
> network device, so I'll try to upgrade it during Christmas when I'm close
> to the box again. In the mean time I have a little ping-script, which
> restarts network (incl. reloading of the tg3 module) when the network
> dies.
> This restart gives me minimal downtime.
>
> But I do not understand why this box was so rock solid until I upgraded
> from 2.6.22-hardened-r8 to 2.6.25-hardened-r8. The new kernel driver
> obviously does something it didn't do before. Unfortunately I can't find
> anything particular in the kernel git logs for the tg3.[ch] files which
> could pin-point anything particular.
>
>
> Does anyone have any experiences regarding firmware upgrades on these
> cards? The instructions seems pretty much forward, but if you know about
> anything, whatever, I would appreciate that.
>
>
> kind regards,
>
> David Sommerseth
>
Rather strange. The collisions and the errors counter shows the same...
It was a long time ago, when I last saw collisions.
There are several possibilities regarding this symptom. It would be
important to know if the card is connected to a hub, or a switch(ing-hub)?
1.) There can be a defective device on the subnet, which is connected to
it from time-to-time, or it is present all the time, but doesn't hog the
line constantly
2.) The switch/hub can have a problem - try reconnecting the card to
another port
3.) The network card can have a problem, which can be software related and
might be solved by a firmware upgrade (unfortunately the card itself
cannot be replaced being an on-board NIC)
4.) It can even be caused by a driver bug - which we know is all the way
possible since the e1000 issue
I hope it'll turn out soon. I would think about a hardware issue, but it's
a disturbing fact, that these symptoms appeared after a kernel upgrade.
Here's my ifconfig output for reference:
bond0 Link encap:Ethernet HWaddr 00:10:18:06:ce:24
inet addr:195.111.75.211 Bcast:195.111.75.255
Mask:255.255.255.192
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:9285671 errors:0 dropped:0 overruns:0 frame:0
TX packets:1681056 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2100416838 (1.9 GiB) TX bytes:1298939064 (1.2 GiB)
eth0 Link encap:Ethernet HWaddr 00:10:18:06:ce:24
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:5395008 errors:0 dropped:0 overruns:0 frame:0
TX packets:1681040 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1529378855 (1.4 GiB) TX bytes:1298937508 (1.2 GiB)
Interrupt:20
eth1 Link encap:Ethernet HWaddr 00:10:18:06:ce:24
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:3890663 errors:0 dropped:0 overruns:0 frame:0
TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:571037983 (544.5 MiB) TX bytes:1556 (1.5 KiB)
Interrupt:21
lspci:
00:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704
Gigabit Ethernet (rev 03)
00:08.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704
Gigabit Ethernet (rev 03)
Regards,
Dw.
--
dr Tóth Attila, Radiológus, 06-20-825-8057, 06-30-5962-962
Attila Toth MD, Radiologist, +36-20-825-8057, +36-30-5962-962