#228: [PATCH] Apparent race condition between mode changing and scanning.
-------------------------------------+--------------------------------------
      Reporter:  [EMAIL PROTECTED]  |       Owner:                              
        
          Type:  defect              |      Status:  new                        
         
      Priority:  major               |   Milestone:  version 0.9.0 - move to 
new codebase
     Component:  madwifi: other      |     Version:  trunk                      
         
    Resolution:                      |    Keywords:                             
         
Patch_attached:  1                   |  
-------------------------------------+--------------------------------------
Comment (by [EMAIL PROTECTED]):

 Hi,

 I read the comments on #275, and added an iteration with a timeout to the
 current patch attached to this ticket, in order to catch the case where
 cancel_scan doesn't work. My solution returns ETIMEDOUT on a timeout, but
 it appears that the userspace tools (iwpriv) don't report errors, so if
 the cancel_scan does time out (I simulated this by making my timeout 1ms),
 the driver is left in an inconsistent state, and the user is not notified.
 I added a debug message to the scan debug, however enabling the scanning
 debug output causes this race condition to not occur, so a scan debug
 message doesn't help at all.

 When the cancel_scan call does time out, there is no way to gracefully
 bail out using the current design of the drivers - in fact, I think it's
 more to do with the design of the net80211 stack, which seems inherently
 racy. In order to get the driver back to a known state, we must change to
 the scanning state (which forces all of the current settings), but if
 cancel_scan has timedout, then we cannot start a new one without locking
 up (the original problem this ticket tries to solve), so we have a
 problem.

 At present, my current solution is this:
 Make the timeout value something like 100ms, which should be fine for
 99.9% of the time (cancel_scan should, in theory, only take a maximum of
 10ms to fire). However, if cancel_scan does time out (or another scan is
 started between cancel_scan firing and us noticing the cancellation of the
 scan), then there is a problem. There is no way at present to bail nicely,
 so I suggest using a straight printk to inform the user of the problem.
 Symptoms upon failure that I see are the interface going into ad-hoc mode
 (from sta mode), and jumping between two frequencies - not the desired
 result. Destroying and re-creating the interface does not fix the problem,
 only removing and reloading the driver.

 Unfortunately, I think a complete fix to the problem is non-trivial. It
 may be that we comprimise, and be happy that 99.9% of the time it works,
 with the possibility that the driver may get into a completely unusable
 state, but without locking up the system.
 As a side note, we might want to declare the timeout value in
 ieee80211_vars.h, so that a solution for #275 can use the same value.

-- 
Ticket URL: <http://madwifi.org/ticket/228>
MadWifi <http://madwifi.org/>
Multiband Atheros Driver for Wireless Fidelity

Reply via email to