The recent journey - filtered frames, power save and multicast/cabq traffic

Adrian Chadd Wed, 26 Sep 2012 10:35:35 -0700

I figured I should summarise/brain dump the last weeks worth of
discoveries into an email whilst it's fresh - partly so I don't
forget, partly so an interested party could pick it up and run with
it.


So, in no particular order:

* There's extra work being done processing empty RX lists. This
happens because the interrupt handler merely schedules ath_rx_proc()
to occur. So if an interrupt comes in whilst ath_rx_proc() is running,
it'll get re-scheduled - and when the second copy runs, the RX queue
has very likely just been cleaned out.

* The TX completion path occasionally (more than I'd like,
unfortunately) returns HAL_EINPROGRESS on the first descriptor in the
list, upon receiving a TXEOL interrupt. TXEOL is "the DMA engine hit
the end of that HWQ descriptor list" - but this doesn't mean the last
descriptor had been handled and completed. It just means the DMA
engine hit a NULL link pointer. So, when a TXEOL comes in,
ath_tx_processq() gets called - but since the hardware is still
processing that descriptor, it comes back as incomplete.

* .. this wouldn't be a problem EXCEPT that the current driver does
some TX interrupt mitigation for legacy chips by only registering for
TXDESC, TXERR and TXEOL interrupts. Not TXOK. The driver then sets
every 5th non-aggregate descriptor to return a TXDESC interrupt. So if
you queue one frame and there's no TXDESC bit set, the above may
happen and you don't get any further interrupts for that queue.

.. which means that TX will stall until the next frame is queued.

.. which for the land of software queues doesn't necessarily happen,
because the driver now limits the number of frames going to the
hardware (and software queues the rest), so if the last descriptor(s)
in the list have this issue and you get a TXEOL only w/ no TXDESC set
in those descriptors, your TX queue will stall.

=> I need to file a couple of PRs on this one and make sure I tidy all
of this logic up. It's a mess/nightmare.

* There's still issues with how the TX watchdog is implemented -
thanks to preemption, reentrant and concurrent code. The summary -
since the watchdog is kicked by just setting a value to 5 (seconds),
it's possible that two concurrent TX paths (say two ath_start()s or an
ath_start() and an ath_raw_xmit()) will overlap with a completion
handler, which will cause the watchdog to be set but not correctly
cleared in time. Specifically:
  + a frame can be TXed;
  + before the watchdog is set, the frame completes, ath_tx_processq()
gets called, clearing the watchdog;
  + then ath_start() or ath_raw_xmit() finishes running, which sets
the watchdog to 5.
  + 5 seconds later, the watchdog timeout hits.

=> I need to file another PR on this and give this a really good thinking about.

* The EAPOL frames are supposed to be treated specially. However, I am
not entirely sure that EAPOL is being set "right" - the hostapd
process writes frames using BPF, the ethertype isn't raw 802.11, so
ieee80211_output() punts it to the ethernet encaps and output code. So
what SHOULD happen is:
  + the ethernet encaps code marks it as ETHERTYPE_PAE;
  + the mbuf will get SOMEHOW marked as M_EAPOL;
  + the ath TX path treats this special - EAPOL frames get the maximum
number of hardware retries (10) and can do a bunch of other things;
  + (later on) when I implement basic tail dropping to ensure the
driver isn't flooded with frames, we can match on M_EAPOL /
ETHERTYPE_PAE and ensure that those don't get tail dropped - only data
frames will.

  The core here (I think) is the hostapd driver sends raw 802.3 frames
rather than 802.11 frames and the current raw path doesn't call
ieee80211_classify() or anything like that; it assumes the ethernet
encaps code will take care of that.

=> I need to file a PR and try chasing this up. But if someone would
like a mini project, I'd really appreciate some help in ensuring that
EAPOL frames get marked as ETHERTYPE_PAE on their journey through to
ath_start().

Ok, now the big one.

The disconnect problem turns out to be due to EAPOL frames being
rejected at the receiving STA. The Macbook Pro receives them and sends
back an ACK - but the application / tcpdump never sees them. There's
only a couple of reasons this could occur (sans bugs :-) - the CCMP IV
is out of sequence, and/or the sequence number is out of sequence.
Now, this occurs during scanning, when my macbook pro was doing
~100mbit of TCP traffic. Since it's not 100% utilised at that point (i
can get up to 170mbit+ TCP iperf right now) it sneaks in background
scanning and power-save transitions. Each of those power save
transitions forces _all_ multicast traffic to be stuffed into the cabq
(content after beacon queue) and will go out just after a beacon.

However!

The multicast traffic is (almost all?) non-QoS traffic, so it gets
stuffed into TID "16". There's a separate sequence number space for
each TID (0..15) as well as one for the non-QoS TID (16.)

In the atheros driver and net80211 stack, sequence number allocation
occurs _early_ during packet encapsulation. Only aggregate frames
delay allocating the sequence number (and leave it up to the driver to
assign.)

EAPOL traffic is also non-QoS for some reason, so it was also being
dumped into TID 16. So it shares the sequence number space with
multicast traffic.

Now, whenever a station went to sleep, the driver would stuff all
traffic into the cabq. That's fine. But then subsequent frames in TID
16 that weren't multicast traffic would also get sequence numbers
assigned, and they may or may not go out immediately. If they go out
immediately then great - when the multicast traffic next went out at
next beacon interval, the sequence numbers would be "before" the
non-multicast traffic and I guess that multicast traffic would be
dropped. (or not, I'd have to double check.)

But if the hardware queue servicing TID 16 was busy, the non-multicast
frames would be dumped in a per-node software queue, and that queue
would be serviced when the hardware queue became free.

If the air was busy doing a bunch of traffic, the EAPOL frames would
go into the software queue, then sit there until after the aggregate
traffic in the queue went out. If we were lucky, this would occur
_before_ the cabq traffic went out. If we weren't lucky, it would be
delayed until after the cabq traffic went out - which may result in
the sequence/IV numbers being out of whack.

So to clarify:

* multicast frames with seq/IV A, B, C are queued - but a STA is in
sleep mode, so it goes into the CABQ;
* EAPOL frame with seq/IV D is queued - but the HWQ is busy, so its
software queued;
* multicast frames with seq/IV E, F is queued, also in CABQ;
* beacon interval occurs;
* cabq is drained, so A, B, C, E, F make it out;
* then EAPOL with D gets out - but since E, F made it out already, the
receiver drops D.

Now, Fixing this is a pain. I just "fixed" it by now by making TID 16
traffic go to the voice queue. It sucks, but it'll just have to do for
now. The real solution is to delay assigning sequence numbers until
the frames are going out to the hardware - and for TID 16 frames, that
may require some significantly complicated software queue handling to
ensure that sequence numbers of frames is "right". (Since there may
already be unicast frames in the hardware queue for TID 16, then the
CABQ gets busted out - once that's done, the hardware will keep
transmitting whatevers in the HWQ servicing TID 16.)

So, solving that particular quirk is going to take quite a bit of thought..

Last but not least:

* When stuffing TID 16 traffic into the cabq, the wrong lock is held.
The CABQ lock is held, rather than the hardware TXQ lock corresponding
to the TID. I'll have to hack this to grab both locks at the right
times.

=> I'll file a PR and fix this; shouldn't be too hard.

Phew!





Adrian
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-wireless
To unsubscribe, send any mail to "[email protected]"

The recent journey - filtered frames, power save and multicast/cabq traffic

Reply via email to