#275: Scan for non-ESSID-broadcasting access point always fails
------------------------------------+---------------------------------------
Reporter: [EMAIL PROTECTED] | Owner: mrenzmann
Type: defect | Status: assigned
Priority: minor | Milestone: version 1.0.0 - first
stable release
Component: madwifi: 802.11 stack | Version: trunk
Resolution: | Keywords:
------------------------------------+---------------------------------------
Comment (by [EMAIL PROTECTED]):
Yes, I agree - the simple, single mdelay() call is inelegant and isn't
guaranteed to work in all cases.
Unfortunately I haven't been able to convince myself that there is any
method which can (even in principle) work "correctly" in all cases. Every
method seems to have shortcomings.
Fundamentally, there seems to be a basic resource-crunch here. There
could be multiple threads of code (different wireless tools - e.g. a
daemon and a GUI panel) which wish to initiate active scans for various
purposes. If I understand things firmly (and perhaps I do not) the
underlying firmware is capable of handling only one active scan at a time.
This creates a conflict: if two or more entities try to scan at once,
then one of them is going to lose, in one way or another. Any of several
things can happen - its scan is never started (as is currently the case in
the main-trunk code), or it's delayed in its ability to start a scan until
the previous scanner is done, or its scan is started but then canceled
"behind its back" without warning or notice. Somebody loses; I'd guess
that the goal is to be reasonably fair in how this happens, and ensure
that no code of thread becomes "stuck" indefinitely.
The simple one-time mdelay() call makes an attempt to shut down a previous
scan semi-gracefully before starting its own. On the bad side, this isn't
guaranteed to work: the previous scan might take longer to shut down, and
another party might come in after the cancellation takes effect and start
another scan ("jumping the queue" in one way or another). On the good
side, this approach wouldn't seem to be capable of causing the calling
thread to hang indefinitely.
The method used in ticket #228 is another way of doing it. On the good
side, it'll proceed more quickly after the scan cancellation takes effect,
and it's more positive about making sure that the cancellation did take
effect. On the bad side, it looks to me as if it could hang the calling
thread for quite a while - if another thread "jumps the queue" and starts
another scan after the cancellation takes effect and before this thread
wakes up and checks, then the queue-jumper's scan wins out and the
original canceller has to wait an indefinite amount of time.
A safer compromise would be to use the method in #228, but with an
iteration count and a timeout after perhaps 50 - 100 milliseconds. If the
interface isn't out of active-scan mode by then, the code could either re-
cancel and wait again ("shooting the claim-jumper") or just bail out
gracefully. Either is probably better than being stuck indefinitely.
A fancier approach would be to maintain some sort of explicit queue of
active scans which had been requested, but not yet actually initiated.
Some piece of code (perhaps a separate kernel thread which managed the
interface, or perhaps the driver bottom-half) would terminate one scan and
start the next, as appropriate. This would be a much more complex and
invasive change to the driver. It's beyond what I'd want to tackle myself
at this stage, and frankly I'm not sure if it's really worth the effort.
I would hope that the higher-level software which is asking for scans
(e.g. wpa_supplicant, GUIs, etc.) would simply treat the results of a scan
conflict the way that they'd treat any other scan which didn't find the
desired APs - they'd idle for a while and then re-scan.
So... I'm quite willing to redo my patch, replacing the simple mdelay()
call with an interation-limited "check flag, sleep if it's still scanning"
loop and a graceful bailout after a reasonable time (100 ms?). Would that
be satisfactory to all concerned? If so, perhaps it would be wise to have
the #228 patch use the same technique?
Is there a call other than mdelay() which would be preferable?
--
Ticket URL: <http://madwifi.org/ticket/275>
MadWifi <http://madwifi.org/>
Multiband Atheros Driver for Wireless Fidelity