Hi,

I hopefully found out why we get a watchdog timeout now
and then.
I spent some time thinking, testing, thinking and thinking
and I found out that this bug triggers when there is no network
traffic only (at least for me). That is kind of strange and
I think a possible reason for this is the following race.


|---5secs - ~10 jiffies time---|---|OOPS
^                              ^
last real TX                   periodic work stops netif

At OOPS, the following happens:
The watchdog timer triggers, because the timeout of 5secs
is over. The watchdog first checks for stopped TX.
_Usually_ TX is only stopped from the TX handler to indicate
a full TX queue. But this is different. We need to stop TX here,
regardless of the TX queue state. So the watchdog recognizes
the stopped device and assumes it is stopped due to full
TX queues (Which is a _wrong_ assumption in this case). It then
tests how far the last TX has been in the past. If it's more than
5secs (which is the case for low or no traffic), it will fire
a TX timeout.

I think the correct solution for this is to fake a TX start
on every periodic work execution. This fake is harmless and
prevents the watchdog from triggering. At least here in my testsuite. :)

Please test this guys.

This patch is against 2.6.18.1 (and not 2.6.18, as the diff prolog suggests)


Index: linux-2.6.18/drivers/net/wireless/bcm43xx/bcm43xx_main.c
===================================================================
--- linux-2.6.18.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c       
2006-10-19 21:30:42.000000000 +0200
+++ linux-2.6.18/drivers/net/wireless/bcm43xx/bcm43xx_main.c    2006-10-19 
21:33:28.000000000 +0200
@@ -3165,7 +3165,15 @@ static void bcm43xx_periodic_work_handle
 
        badness = estimate_periodic_work_badness(bcm->periodic_state);
        mutex_lock(&bcm->mutex);
+
+       /* We must fake a started transmission here, as we are going to
+        * disable TX. If we wouldn't fake a TX, it would be possible to
+        * trigger the netdev watchdog, if the last real TX is already
+        * some time on the past (slightly less than 5secs)
+        */
+       bcm->net_dev->trans_start = jiffies;
        netif_tx_disable(bcm->net_dev);
+
        spin_lock_irqsave(&bcm->irq_lock, flags);
        if (badness > BADNESS_LIMIT) {
                /* Periodic work will take a long time, so we want it to



-- 
Greetings Michael.
_______________________________________________
Bcm43xx-dev mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev

Reply via email to