On 6 November 2013 05:41, Bob Copeland <m...@bobcopeland.com> wrote: > Hi Adrian (and anyone else who wants to chime in), > > I'm looking to fix some accounting issues in the ath5k tx path, which > arise because we always try to keep one descriptor available at the > end for the hardware's sake. > > I believe you have some concrete, hard-won knowledge about when the > hardware marks a descriptor complete versus when the 'next' pointer > in the descriptor is accessed -- do you mind sharing?
Sure! The TL;DR version - you need to keep the the final descriptor around for each hardware queue. The legacy madwifi/freebsd way of doing it was to keep the last descriptor that was handled. This wasn't enough. It needs to be per TX queue. You also must never program TxDP after you've programmed it once - you can only program it again after you've completely torn down TX DMA for the particular queue. This is relevant for ath5k _and_ ath9k chips! The long version: The DMA engine follows the TX descriptor link list to process things. Once it finishes a descriptor, it'll read the link descriptor pointer to see what the next descriptor is. If it's NULL, it'll keep the address of the link pointer around so it can re-read it when it needs to start DMA again. The reason for this is because TX DMA can restart at any time for any reason. It can be gated by timers, by beacon reception, by burst / ready time, etc; it can be pre-empted by traffic from a higher priority queue. It stores this per QCU, so you need to store the last descriptor per QCU, rather than just the last it handled. Now, since the intended use case is that software will just write to the link pointer if there's a new packet to transmit, it needs to be handled in an atomic way. So, here are the conditions: * no transmit, no past transmit. TxDP[n] is 0x0. It's never transmitted anything before. You queue a frame to TxDP[n], set TxE[n] to 1, and it starts. * transmit is going on, TxDP[n] is non-NULL. You add something to the link pointer and then you set TxE[n] to 1 to kick things along. The DMA engine was NOT already sending the final frame in the queue to the air, so it'll just keep reading the list as normal. * transmit is going on, TxDP[n] is non-NULL and it points to the last frame. You sneak in an update to the link pointer before the hardware completes the frame, tickle TxE[n] and the hardware will keep going. * transmit is going on, TxDP[n] is non-NULL and it points to the last frame. It fnishes it, reads the NULL pointer, sees its NULL, stops transmitting. You update the link pointer, tickle TxE[n], but .. and here's the problem. So to make the last option work, the only sane way is for the hardware to re-read the link pointer to see if it's updated. It'll then always do the right thing - if it's just read a NULL link pointer it'll re-read it to see the updated value and continue transmitting. The _other_ option (which i think freebsd/madwifi did) was to just update link pointer and write a new TxDP before transmitting. The TDMA code in FreeBSD did this. Now this would also get around the race, right? If you hit the end of the list and it hit NULL, then you write a new TxDP[n] and set TxE to 1, it should start the new list, right? Then, comes the "weird crap' I found when digging through the RTL (chip source.) There are erm, "situations" where the chip won't actually process a TxDP update. Thus, even if you write a new TxDP out, even if the hardware is not transmitting, it may "ignore" the write. This is why I took the time in FreeBSD to do this: * staging descriptor is now per-QCU; * if i've never transmitted before, I will update TxDP * otherwise, if I've transmitted before, I will always append a frame to the transmit queue by adding it to the last descriptor (and if this is a done staging descriptor, so be it), and poke TxE[n] This works really, really well. Now for the ath9k relevant bits! All the ath9k chips behave the same. Even the EDMA chips. So, for EDMA, you _can_ just queue one frame per TX FIFO slot. You don't have the above behaviour. However, if you queue multiple frames to a TX FIFO slot (_not_ A-MPDU frames, but say, you append 15 frames to the cabq or 15 non-aggregate frames to a normal data queue by appending 15 frames to one TX FIFO entry, rather than one frame per FIFO entry) then the same behaviour as above may occur. You still need to keep a staging descriptor around whilst you're processing TX completions, as TX completions are per-frame, _not_ per TX FIFO slot. Once you've processed the whole slot, you don't need to keep the staging descriptor around for the next slot. FreeBSD does this for TX EDMA descriptor handling. I hope this clears things up! Next - ask me about: * read-and-clear bugs * PHY register read problems, and why ANI may occasionally look whacked -adrian _______________________________________________ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel