Dear open80211s developers,

my name is Pedro Larbig, possibly better known as "ASPj" from the 
aircrack-ng project. I am working on a mesh testbed at the Technical 
University of Darmstadt.
We are trying to implement a real life mesh network that runs AODV, 
BATMAN and IEEE 802.11s.

While the first two protocols do run well, open80211s (or possibly the 
ath5k driver we're using) seems to still have some major bugs.

We're having a 20 node setup here at our building, being monitored and 
configured from a central server, that also supplies PXE booting and an 
NFS filesystem.

First, running kernel 2.6.35.7 on a Debian 5 base, the nodes randomly 
crashed when 80211s was running: "spurious APIC interrupt on CPU#0, 
should never happen."
I updated the drivers with some recent compat-wireless, and hit some 
other bug that only occured in mesh mode: "BUG: soft lockup - CPU#0 
stuck for 61s!"
It was stuck at ath5k_tx_queue+0x4c2/0x59a, which was some rcu lock 
according to gdb.
So I installed wireless-testing, which had the same problem.
Afterwards, I disabled SMP support in the kernel, which made this 
deadlock disappear.

However, I ran into another bug.
The network itself was fine, but as soon as I started to generate 
traffic with lots of pings, certain nodes that had 8 to 12 active 
neighbors, ran out of kernel memory!
slabinfo told me that all the memory had been eaten by kmalloc calls. 
kmemleak found some suspected leaks from skb_copy in 
ieee80211_rx_handlers.
So I enabled some mac80211 debugging, mounted debugfs and discovered, 
that the queues were filling up:
mndii-04:/sys/kernel/debug/ieee80211/phy3# cat queues
00: 0x00000001/48992
01: 0x00000000/0
02: 0x00000001/0
03: 0x00000000/0
mndii-04:/sys/kernel/debug/ieee80211/phy3# grep "" statistics/rx*
statistics/rx_expand_skb_head:0
statistics/rx_expand_skb_head2:0
statistics/rx_handlers_drop:42279
statistics/rx_handlers_drop_defrag:0
statistics/rx_handlers_drop_nullfunc:0
statistics/rx_handlers_drop_passive_scan:0
statistics/rx_handlers_drop_short:0
statistics/rx_handlers_fragments:0
statistics/rx_handlers_queued:293706

Adding some printk lines to rx.c it seemed that most of those packets 
came from ieee80211_rx_h_action, hitting the "queue:" label there.

Fun thing is, as soon as the first few nodes in crowded areas (therefor 
with many neighbors) crashed due to memory problems, the other nodes, 
having less neighbors then, slowly started emptying their queues.


I hope we can fix this so I get a working mesh network and you get 
closer to a 1.0 release :D

Grettings
ASPj

_______________________________________________
Devel mailing list
Devel@lists.open80211s.org
http://open80211s.com/mailman/listinfo/devel

Reply via email to