Carl-Daniel Hailfinger schrieb:
> Hi,
>
> Carl-Daniel Hailfinger schrieb:
>
>>after sending 259 GB and receiving 25 GB over my SysKonnect SK-9E21
>>card (sky2 says it is a "Yukon-EC (0xb6) rev 1"), the card appears
>>dead. Machine is an Athlon64 3200+ on an Asus A8N-SLI Deluxe board.
>>
>>sky2 v0.11 addr 0xc9000000 irq 74 Yukon-EC (0xb6) rev 1
>>sky2 eth3: addr 00:00:5a:70:30:fb
>>[...]
>>sky2 eth3: enabling interface
>>[...]
>>sky2 eth3: phy interrupt status 0x1c40 0x7d0c
>>sky2 eth3: Link is up at 100 Mbps, full duplex, flow control both
>>[...]
>>NETDEV WATCHDOG: eth3: transmit timed out
>>sky2 eth3: tx timeout
>>NETDEV WATCHDOG: eth3: transmit timed out
>>sky2 eth3: tx timeout
>>
>>
>>switch:~ # ifconfig eth3
>>eth3 Link encap:Ethernet HWaddr 00:00:5A:70:30:FB
>> inet6 addr: fe80::200:5aff:fe70:30fb/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:130530358 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:209647800 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:25980735946 (24777.1 Mb) TX bytes:259787058579 (247752.2
>> Mb)
>> Interrupt:74
>>
>>switch:~ # cat /proc/interrupts
>> CPU0
>> 0: 11213627 IO-APIC-edge timer
>> 1: 24783 IO-APIC-edge i8042
>> 8: 0 IO-APIC-edge rtc
>> 9: 0 IO-APIC-level acpi
>> 15: 401558 IO-APIC-edge ide1
>> 50: 249384881 IO-APIC-level eth0
>> 58: 179123938 IO-APIC-level sky2
>> 66: 3 IO-APIC-level sky2, ohci1394
>> 74: 98956955 IO-APIC-level sky2
>> 82: 19952 IO-APIC-level sky2
>>217: 1865 IO-APIC-level libata, NVidia CK804
>>225: 263052 IO-APIC-level libata, ehci_hcd:usb1
>>NMI: 11098
>>LOC: 11214113
>>ERR: 0
>>MIS: 0
>>
>>Not only will the card not transmit anymore, it also doesn't
>>receive any packet at all. "ethtool -r eth3" doesn't change
>>anything, taking the interface down and up again also doesn't
>>help. The interrupt count of interrupt 74 stays constant after
>>failing.
>>
>>modprobe -r sky2; modprobe sky2
>>fixes the problem for me, so maybe resetting the card on TX
>>timeouts will help.
>>
>>The same problem appeared much earlier for another card which
>>shared interrupt 58 with an onboard card driven by skge. After
>>disabling the skge driver and rebooting, that card has been
>>stable so far.
>>
>>The card is connected to a 100 MBit switch.
>>
>>These problems didn't appear with sk98lin v8.14.3.3 (that
>>driver did survive about 10 TB of traffic before I rebooted).
>>
>>Register dumps are available on request (too big for this
>>list).
>>
>>I will now try sky2 0.13 and report back.
>
>
> And it hit the other interface after 200 MB transferred...
> NETDEV WATCHDOG: bridgeext0: transmit timed out
> sky2 bridgeext0: tx timeout
> NETDEV WATCHDOG: bridgeext0: transmit timed out
> sky2 transmit interrupt missed? recovered
>
> Although the driver claims to recover, it doesn't recover at all.
> What debug level would be advisable? It is now running with
> "modprobe sky2 debug=2", but I can't see more than the messages
> above.
>
> I have now added a hard reset routine to the tx timeout
> path and hope it won't kill my machine.
Apologies for mangled whitespace, this is just a rough cut'n'paste.
--- linux-2.6.15/drivers/net/sky2.c.orig 2006-01-21 16:00:15.000000000
+0100
+++ linux-2.6.15/drivers/net/sky2.c 2006-01-21 14:08:28.000000000 +0100
@@ -1565,6 +1565,7 @@ static int sky2_autoneg_done(struct sky2
return 0;
}
+static int sky2_reset(struct sky2_hw *hw);
/*
* Interrupt from PHY are handled outside of interrupt context
* because accessing phy registers requires spin wait which might
@@ -1639,6 +1640,7 @@ static void sky2_tx_timeout(struct net_d
if (netif_msg_timer(sky2))
printk(KERN_ERR PFX "%s: tx timeout\n", dev->name);
+ if (0) {
sky2_write32(hw, Q_ADDR(txq, Q_CSR), BMU_STOP);
sky2_write32(hw, Y2_QADDR(txq, PREF_UNIT_CTRL), PREF_UNIT_RST_SET);
@@ -1646,6 +1648,12 @@ static void sky2_tx_timeout(struct net_d
sky2_qset(hw, txq);
sky2_prefetch_init(hw, txq, sky2->tx_le_map, TX_RING_SIZE - 1);
+ } else {
+ printk(KERN_ERR PFX "%s: recovering the HARD way...\n", dev->name);
+ sky2_down(dev);
+ sky2_reset(hw);
+ sky2_up(dev);
+ }
}
And everytime the kernel throws this message, I run the following
script:
#!/bin/bash
deadinterface=`dmesg|grep HARD|tail -1|sed "s/.*sky2 //;s/:.*//"`
ip l s $deadinterface down
ip l s $deadinterface up
After that, everything continues to work until the next tx timeout
happens, and then the script again saves the day.
More results about the circumstances of this bug: It seems that
it will only trigger under LOW load. As long as I keep the interface
busy, it will have no problems at all.
Regards,
Carl-Daniel
--
http://www.hailfinger.org/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html