#228: [PATCH] Apparent race condition between mode changing and scanning.
-------------------------------------+--------------------------------------
Reporter: [EMAIL PROTECTED] | Owner:
Type: defect | Status: new
Priority: major | Milestone: version 0.9.0 - move to
new codebase
Component: madwifi: other | Version: trunk
Resolution: | Keywords:
Patch_attached: 1 |
-------------------------------------+--------------------------------------
Comment (by [EMAIL PROTECTED]):
Hi,
I read the comments on #275, and added an iteration with a timeout to the
current patch attached to this ticket, in order to catch the case where
cancel_scan doesn't work. My solution returns ETIMEDOUT on a timeout, but
it appears that the userspace tools (iwpriv) don't report errors, so if
the cancel_scan does time out (I simulated this by making my timeout 1ms),
the driver is left in an inconsistent state, and the user is not notified.
I added a debug message to the scan debug, however enabling the scanning
debug output causes this race condition to not occur, so a scan debug
message doesn't help at all.
When the cancel_scan call does time out, there is no way to gracefully
bail out using the current design of the drivers - in fact, I think it's
more to do with the design of the net80211 stack, which seems inherently
racy. In order to get the driver back to a known state, we must change to
the scanning state (which forces all of the current settings), but if
cancel_scan has timedout, then we cannot start a new one without locking
up (the original problem this ticket tries to solve), so we have a
problem.
At present, my current solution is this:
Make the timeout value something like 100ms, which should be fine for
99.9% of the time (cancel_scan should, in theory, only take a maximum of
10ms to fire). However, if cancel_scan does time out (or another scan is
started between cancel_scan firing and us noticing the cancellation of the
scan), then there is a problem. There is no way at present to bail nicely,
so I suggest using a straight printk to inform the user of the problem.
Symptoms upon failure that I see are the interface going into ad-hoc mode
(from sta mode), and jumping between two frequencies - not the desired
result. Destroying and re-creating the interface does not fix the problem,
only removing and reloading the driver.
Unfortunately, I think a complete fix to the problem is non-trivial. It
may be that we comprimise, and be happy that 99.9% of the time it works,
with the possibility that the driver may get into a completely unusable
state, but without locking up the system.
As a side note, we might want to declare the timeout value in
ieee80211_vars.h, so that a solution for #275 can use the same value.
--
Ticket URL: <http://madwifi.org/ticket/228>
MadWifi <http://madwifi.org/>
Multiband Atheros Driver for Wireless Fidelity