Dear open80211s developers, my name is Pedro Larbig, possibly better known as "ASPj" from the aircrack-ng project. I am working on a mesh testbed at the Technical University of Darmstadt. We are trying to implement a real life mesh network that runs AODV, BATMAN and IEEE 802.11s.
While the first two protocols do run well, open80211s (or possibly the ath5k driver we're using) seems to still have some major bugs. We're having a 20 node setup here at our building, being monitored and configured from a central server, that also supplies PXE booting and an NFS filesystem. First, running kernel 2.6.35.7 on a Debian 5 base, the nodes randomly crashed when 80211s was running: "spurious APIC interrupt on CPU#0, should never happen." I updated the drivers with some recent compat-wireless, and hit some other bug that only occured in mesh mode: "BUG: soft lockup - CPU#0 stuck for 61s!" It was stuck at ath5k_tx_queue+0x4c2/0x59a, which was some rcu lock according to gdb. So I installed wireless-testing, which had the same problem. Afterwards, I disabled SMP support in the kernel, which made this deadlock disappear. However, I ran into another bug. The network itself was fine, but as soon as I started to generate traffic with lots of pings, certain nodes that had 8 to 12 active neighbors, ran out of kernel memory! slabinfo told me that all the memory had been eaten by kmalloc calls. kmemleak found some suspected leaks from skb_copy in ieee80211_rx_handlers. So I enabled some mac80211 debugging, mounted debugfs and discovered, that the queues were filling up: mndii-04:/sys/kernel/debug/ieee80211/phy3# cat queues 00: 0x00000001/48992 01: 0x00000000/0 02: 0x00000001/0 03: 0x00000000/0 mndii-04:/sys/kernel/debug/ieee80211/phy3# grep "" statistics/rx* statistics/rx_expand_skb_head:0 statistics/rx_expand_skb_head2:0 statistics/rx_handlers_drop:42279 statistics/rx_handlers_drop_defrag:0 statistics/rx_handlers_drop_nullfunc:0 statistics/rx_handlers_drop_passive_scan:0 statistics/rx_handlers_drop_short:0 statistics/rx_handlers_fragments:0 statistics/rx_handlers_queued:293706 Adding some printk lines to rx.c it seemed that most of those packets came from ieee80211_rx_h_action, hitting the "queue:" label there. Fun thing is, as soon as the first few nodes in crowded areas (therefor with many neighbors) crashed due to memory problems, the other nodes, having less neighbors then, slowly started emptying their queues. I hope we can fix this so I get a working mesh network and you get closer to a 1.0 release :D Grettings ASPj _______________________________________________ Devel mailing list Devel@lists.open80211s.org http://open80211s.com/mailman/listinfo/devel