Thanks for the quick response Adrian.

On Wed, Sep 7, 2011 at 11:47 AM, Adrian Chadd <adr...@freebsd.org> wrote:
> If the DMA RX stop storm occured then it meant the NIC thought it hit
> the end of the RX descriptor list (whether you did or not) and it just
> kept signalling it couldn't write packets anywhere.

I am not certain it is in fact the DMA RX stop storm. The occurrence
often coincided when the storm use to be much more pervasive. Now we
still see it even when there is no storm occurring, e.g. the interrupt
debug counters are not increasing and there are none of the "unable to
stop RX DMA" kernel log errors.

> I remember seeing this in FreeBSD, so I added some code to the RX
> tasklet to forcibly reset the PCU receive and re-link all the RX
> descriptors. It causes packet loss when it occurs (and it only occurs
> when I'm thrashing the NIC with too much UDP traffic) but I bet it
> could also occur if I enabled PHY errors (eg when doing radar
> detection) in a very busy+noisy environment.

I have seen you mention this in other postings. Like stated above I am
pretty certain it is occurring even when there is no DMA storm, but
what is intriguing is that you seem to be seeing the same trigger.
That being when high volumes of traffic coming into the interface.

> ath9k handles the RX descriptors a bit differently but when I tried
> the same method in FreeBSD, it still ended up occasionally hitting
> RXEOL, firing off RXORN interrupts and then getting very pissed off at
> me. I'll do some further digging soon and I'll post an update to the
> list when I figure it out.

I will go back and see if in fact I can see an RXEOL being fired when
the lock-up occurs for us.

> If you're up for a bit of coding, here's what I did:

Always up for a challenge (^_^)

> * when RXEOL interrupt is received, set sc->sc_kickpcu=1; disable RXEOL/RXORN;
> * in ath_rx_tasklet() (in ath9k, it's not called that in FreeBSD) run
> the normal descriptor list check, then once that's done, if
> sc->sc_kickpcu == 1:
>    * set it to 0
>    * call pcu stop;
>    * re-initialise all of the descriptors
>    * call pcu start;
>    * re-enable interrupts, with RXEOL|RXORN re-enabled.

This may be as simple (with some additional success checks) as a:

if (sc->sc_kickpcu == 1) {
    ath_stoprecv(sc);
    ath_rx_cleanup(sc);
    ath_rx_init(sc, ATH_RXBUF);
    ath_startrecv(sc);

    sc->sc_kickpcu = 0
}

right after unlocking the spinlock, afterwards because three of those
calls all try to lock the rxbuflock.


> This reliably fixes all the crazy stuff I saw when I didn't do the
> above but it does give (to me, unacceptable) packet loss under very
> high UDP RX load.

Do you have this fix in the FreeBSD mainline? If so, would it be
beneficial to the mainline ath9k for a similar fix?

v/r,
Daniel
_______________________________________________
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel

Reply via email to