#1897: kernel oops
-----------------------------------+----------------------------------------
Reporter: [EMAIL PROTECTED] | Type: defect
Status: new | Priority: major
Component: madwifi: other | Version: trunk
Keywords: | Patch_attached: 0
-----------------------------------+----------------------------------------
I can reproduce a kernel crash with madwifi versions between at least
r3314 and r3543 in AP mode, probably even sooner and also later. 0.9.4 is
also affected by this bug. Tried this on a Gateworks IXP425 board and a
PC-Engines WRAP board with a CM9, Senao NMP8602 and a NMP8602-PLUS card.
The software on the test box is sending ethernet frames to a client that
is not yet connected or was connected, got out of reach and is re-
associating. Prior to the oops is a problem with rating the client (this
happens with at least ath_rate_sample and ath_rate_onoe):
{{{ath_rate_sample: no rates for 00:02:6f:47:f1:0a?}}}
As a result of the "no rates" the code will run into an error condition in
ath/if_ath.c:
{{{
if (txrate == 0) {
/* Drop frame, if the rate is 0.
* Otherwise this may lead to the continuous transmission
of
* noise. */
printk("%s: invalid TX rate %u (%s: %u)\n", dev->name,
txrate, __func__, __LINE__);
return -EIO;
}
}}}
after this the cleanup function cleanup_ath_buf_debug() will be called,
this is where the kernel oopses:
{{{
if (bf->bf_skbaddr) {
bus_unmap_single(
sc->sc_bdev,
bf->bf_skbaddr,
(direction == BUS_DMA_FROMDEVICE ?
sc->sc_rxbufsize : bf->bf_skb->len),
direction);
bf->bf_skbaddr = 0;
bf->bf_desc->ds_link = 0;
bf->bf_desc->ds_data = 0;
}
}}}
in my case of "no rates" when sending data to the client the direction is
BUS_DMA_TODEVICE but bf->bf_skb is NULL, hence the crash when
dereferencing bf->bf_skb->len.
As a workaround I tried to simply handle the "no rates" case in
ath_tx_start() by setting the txrate to either
{{{
rix = sc->sc_minrateix;
txrate = rt->info[rix].rateCode;
}}}
or
{{{
vap->iv_mcast_rate
}}}
But while this helps preventing the oops by not running into the above
error situation, it shows a memory leak. I am not sure if the leak is
caused by "my fix" or if the leak is caused by the same problem that
initially caused the "no rates".
I have tried to dig deeper but didn't get very far. I did notice the
"ath_rate_sample: no rates for 00:02:6f:47:f1:0a?" seems to get triggered
by an ath_tx_start() that is called before ieee80211_node_join() was
done. Since I was looking for a memory leak I looked at the addresses of
the ath_nodes when ath_rate_sample is called for the station I want to
send data to and when the join() is done - but that is the same address,
so there are NOT two nodes being created for the frame when the STA hasn't
yet been fully associated at tx-time.
But I am rather helpless as to where look from here.
I have attached my test-tool that I use to reproduce this. All it does is
create a VAP in AP mode and send as many ethernet packets as possible to
the client's Mac-Address. As soon as the client associates to the AP the
Oops will happen (or with my workaround I will see the "ath_rate_sample:
no rates for 00:02:6f:47:f1:0a?" and then have lost about 4k of memory.
I am calling my tool with "wlantest -c 11 -m 00:02:6f:47:f1:0a" and then
associate a secondary box as a client to my AP on channel 11. One of my
observations is, some brands of clients do not seem to be able to trigger
this situation, but with all Atheros or Intel Cards/Drivers I can
reproduce this 100%. While my tool seems to trigger this bug in an odd
situation (sending to a MAC address that isn't yet known to the system) I
have seen the exact same problem in the wild, just not reproducible so
easily. I am also getting the "no rates" problem in another setup with a
regular client connecting to the AP with WEP in "open" mode, but I am not
done debugging this. All I know it is the same "no rates" error that later
would cause the oops. In this setup I am not doing anything weird like
sending directly to a not-yet-associated MAC but having a regular laptop
trying to surf the Web.
I would love to help debug this further, but am out of ideas where to dig
further right now.
joerg
--
Ticket URL: <http://madwifi.org/ticket/1897>
madwifi.org <http://madwifi.org/>
Multiband Atheros Driver for Wireless Fidelity
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Madwifi-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/madwifi-tickets