#1897: kernel oops
-----------------------------------+----------------------------------------
 Reporter:  [EMAIL PROTECTED]  |             Type:  defect
   Status:  new                    |         Priority:  major 
Component:  madwifi: other         |          Version:  trunk 
 Keywords:                         |   Patch_attached:  0     
-----------------------------------+----------------------------------------
 I can reproduce a kernel crash with madwifi versions between at least
 r3314 and r3543 in AP mode, probably even sooner and also later. 0.9.4 is
 also affected by this bug. Tried this on a Gateworks IXP425 board and a
 PC-Engines WRAP board with a CM9, Senao NMP8602 and a NMP8602-PLUS card.

 The software on the test box is sending ethernet frames to a client that
 is not yet connected or was connected, got out of reach and is re-
 associating. Prior to the oops is a problem with rating the client (this
 happens with at least ath_rate_sample and ath_rate_onoe):

 {{{ath_rate_sample: no rates for 00:02:6f:47:f1:0a?}}}

 As a result of the "no rates" the code will run into an error condition in
 ath/if_ath.c:

 {{{
 if (txrate == 0) {
                 /* Drop frame, if the rate is 0.
                  * Otherwise this may lead to the continuous transmission
 of
                  * noise. */
                 printk("%s: invalid TX rate %u (%s: %u)\n", dev->name,
                         txrate, __func__, __LINE__);
                 return -EIO;
         }
 }}}

 after this the cleanup function cleanup_ath_buf_debug() will be called,
 this is where the kernel oopses:


 {{{
         if (bf->bf_skbaddr) {
                 bus_unmap_single(
                         sc->sc_bdev,
                         bf->bf_skbaddr,
                         (direction == BUS_DMA_FROMDEVICE ?
                                 sc->sc_rxbufsize : bf->bf_skb->len),
                         direction);
                 bf->bf_skbaddr = 0;
                 bf->bf_desc->ds_link = 0;
                 bf->bf_desc->ds_data = 0;
         }
 }}}

 in my case of "no rates" when sending data to the client the direction is
 BUS_DMA_TODEVICE but bf->bf_skb is NULL, hence the crash when
 dereferencing bf->bf_skb->len.

 As a workaround I tried to simply handle the "no rates" case in
 ath_tx_start() by setting the txrate to either
 {{{
                 rix = sc->sc_minrateix;
                 txrate = rt->info[rix].rateCode;
 }}}
 or
 {{{
 vap->iv_mcast_rate
 }}}

 But while this helps preventing the oops by not running into the above
 error situation, it shows a memory leak. I am not sure if the leak is
 caused by "my fix" or if the leak is caused by the same problem that
 initially caused the "no rates".


 I have tried to dig deeper but didn't get very far. I did notice the
 "ath_rate_sample: no rates for 00:02:6f:47:f1:0a?" seems to get triggered
 by an ath_tx_start() that is called before  ieee80211_node_join() was
 done. Since I was looking for a memory leak I looked at the addresses of
 the ath_nodes when ath_rate_sample is called for the station I want to
 send data to and when the join() is done - but that is the same address,
 so there are NOT two nodes being created for the frame when the STA hasn't
 yet been fully associated at tx-time.

 But I am rather helpless as to where look from here.

 I have attached my test-tool that I use to reproduce this. All it does is
 create a VAP in AP mode and send as many ethernet packets as possible to
 the client's Mac-Address. As soon as the client associates to the AP the
 Oops will happen (or with my workaround I will see the "ath_rate_sample:
 no rates for 00:02:6f:47:f1:0a?" and then have lost about 4k of memory.

 I am calling my tool with "wlantest -c 11 -m 00:02:6f:47:f1:0a" and then
 associate a secondary box as a client to my AP on channel 11. One of my
 observations is, some brands of clients do not seem to be able to trigger
 this situation, but with all Atheros or Intel Cards/Drivers I can
 reproduce this 100%. While my tool seems to trigger this bug in an odd
 situation (sending to a MAC address that isn't yet known to the system) I
 have seen the exact same problem in the wild, just not reproducible so
 easily. I am also getting the "no rates" problem in another setup with a
 regular client connecting to the AP with WEP in "open" mode, but I am not
 done debugging this. All I know it is the same "no rates" error that later
 would cause the oops. In this setup I am not doing anything weird like
 sending directly to a not-yet-associated MAC but having a regular laptop
 trying to surf the Web.

 I would love to help debug this further, but am out of ideas where to dig
 further right now.


 joerg

-- 
Ticket URL: <http://madwifi.org/ticket/1897>
madwifi.org <http://madwifi.org/>
Multiband Atheros Driver for Wireless Fidelity
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Madwifi-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/madwifi-tickets

Reply via email to