On Tue, Nov 29, 2016 at 8:59 PM, Johannes Berg
> On Wed, 2016-11-23 at 15:59 +0800, Chris Chiu wrote:
>> ieee80211_iface_work() will check if sw scanning is in progress
>> before handling block ack session. In our case, the RTL8821AE
>> operate in station mode, when tx session expired, DELBA packet
>> stuck during sw scanning and so do other data packets.
>> ieee80211_scan_state_decision() will take lots of time in
>> SCAN_SUSPEND/SCAN_RESUME state due to !tx_empty or bad_latency.
>> Then the sw scanning mostly take > 20 seconds to finish or even
>> worse in our case RTL8821AE have 37 channels for 2G+5G to scan
>> and tx stalls ~300 seconds. AP side still thinks the connection
>> is alive because it still receives the QoS NULL packet from STA.
>> So the link state will never change but actually no data tx/rx
>> during this long time.
>> This commit tries to send out packet in SCAN_SUSPEND state so the
>> sw scanning can complete more efficiently and less affect on Block
>> Ack session handling. Verified on RTL8821AE for > 30000 pings and
>> no Tx BA session stuck observed.
> The premise seems fairly reasonable, although I'm a little worried that
> if so much new traffic is coming in we never finish the scan suspend?
> Actually, the queues are still stopped, so it's only management frames
> that can come in, so that should be ok?
Actually it will finish scan eventually and back to SCAN_DECISION
state but almost 20~30 seconds elapsed. The local->scanning should be
cleared after all channels been scanned. However, from the debug
messages I added in ieee80211_iface_work(), it still returns when
check local->scanning and the DELBA still has no chance to be
transferred then stuck again at the next scan state machine. Supposed
to be another scan request issued but I don't know who's the killer.
Except to find who issue the next scan request, BA session have no
chance to reset in this long scan period (>20s) still need to be taken
>> + test_and_clear_bit(SCAN_SUSPEND_SCANNING, &local->scanning);
> That makes no sense, you're not checking the return value, just clear
> the bit without test.
>> @@ -844,6 +846,8 @@ static void ieee80211_scan_state_suspend(struct
>> ieee80211_local *local,
>> /* disable PS */
>> + __set_bit(SCAN_SUSPEND_SCANNING, &local->scanning);
> Why are you using the non-atomic version here, vs. the atomic one
You're right. I just want to clear_bit and set_bit in this case, sorry
for that confusing. Or any better suggestion?