[re-adding the previous thread]
> As discussed earlier, please find attached tar ball for driver logs,
> traces and debug patches.
> I have added a README for reference in attached folder, go through the
> steps and let me know your feedback.
Thanks! The tracing doesn't seem to have worked, but the logging helps a
lot.
So what happens, to summarize, is that we have hardware that doesn't
support scanning on both bands at the same time, i.e.
SINGLE_SCAN_ON_ALL_BANDS isn't set.
This is with the patch to make -EPERM into "please do software scan" (we
can debate whether that should be -EPERM or 1 or something, doesn't
matter for this discussion).
This logic only happens here:
if (local->ops->hw_scan) {
WARN_ON(!ieee80211_prep_hw_scan(local));
rc = drv_hw_scan(local, sdata, local->hw_scan_req);
if (rc == -EPERM) {
set_bit(SCAN_HW_CANCELLED, &local->scanning);
__set_bit(SCAN_SW_SCANNING, &local->scanning);
rc = ieee80211_start_sw_scan(local, sdata);
}
(in __ieee80211_start_scan).
Next thing is that the software scan completes. This calls
__ieee80211_scan_completed(), which assumes hardware scan is running,
since
bool hw_scan = local->ops->hw_scan;
This now again runs drv_hw_scan():
if (hw_scan && !aborted &&
!ieee80211_hw_check(&local->hw, SINGLE_SCAN_ON_ALL_BANDS) &&
ieee80211_prep_hw_scan(local)) {
int rc;
rc = drv_hw_scan(local,
rcu_dereference_protected(local->scan_sdata,
lockdep_is_held(&local->mtx)),
local->hw_scan_req);
which presumably again returns -EPERM (conditions didn't change), and
thus we end up with scan abort.
To work around this, you were setting and testing HW_SCAN_CANCELLED in
two new places (you can see the setting above).
Looking at this in more detail, I think we have multiple issues with the
way the fallback is implemented.
First of all, this issue at hand. That's worked around, and could be -
more properly IMHO - worked around by setting a new bit
(HW_SCAN_SW_FALLBACK or so).
However, note that due to the way you implemented this, the software
scan requests that happens as a fallback behaves differently from a
regular software scan. This doesn't matter to you (right now) because
you only fall back to software if you're not connected, but in the case
that you _are_ connected, it would behave differently due to all the
code in __ieee80211_start_scan() that handles a single-channel software
scan differently if that matches the currently used channel.
So ultimately, I think the (combined) problem is that the fallback is
implemented badly.
My suggestion would be to separate the decision of whether to do
software or hardware scan from the hw_scan callback. This could be done
by flags, e.g. hw->scanflags & IEEE80211_SW_SCAN_WHILE_UNASSOC, which
would be your case IIRC, though TBD what that means for AP mode.
Obviously, you'll have to change all the (four) things that are checking
for "local->ops->hw_scan" to check something else. The three in
__ieee80211_start_scan() obviously just check the result of the
condition, and the one in __ieee80211_scan_completed() can already check
local->scanning today since that's set by then (and the scanning==0 case
is only valid with aborted, so not relevant for the hw_scan condition).
Perhaps we can actually do the return value from drv_hw_scan(), but in
that case I'd say it should look something like this:
https://p.sipsolutions.net/6ca3554b24e09ffb.txt
Yes, that's less efficient (due to the whole idle recalculation), but
it's much easier to understand what's going on, IMHO.
There's also a question about whether or not capabilities match, e.g. if
we do software scan we report 4 SSIDs, low prio scan, AP scan, random
scan, minimal scan content ... these are OK, but if there are ever
features that mac80211 *doesn't* support, we'll be in trouble if
somebody tries to use them.
Hope this helps. If my patch actually works (I obviously haven't tested)
then I can send it to the list with signed-off-by and all.
johannes