On Tue, 2009-04-28 at 09:42 -0700, Howard Chu wrote: > Dan Williams wrote: > > On Fri, 2009-04-24 at 15:18 -0700, Howard Chu wrote: > >> Dan Williams wrote: > >>> On Mon, 2009-04-20 at 15:37 -0700, Howard Chu wrote: > >>>> Howard Chu wrote: > >>>>> This is probably more related to the ath9k driver, but I wanted to > >>>>> start here > >>>>> in case anyone is familiar with it. I've been seeing this for the past > >>>>> couple > >>>>> months, and I just now rebuilt NM fresh from git and it's still > >>>>> happening: > >>>> > >>>> I seem to have ruled out the driver; doing a kill -9 on NetworkManager > >>>> so it > >>>> doesn't have the opportunity to tear down the connection on exit, shows > >>>> that > >>>> the wifi connection works perfectly once NetworkManager is gone. No > >>>> disassociation messages in dmesg, no pauses in ssh sessions, etc. > >>> > >>> Don't rule out the driver. Does the driver always return the currently > >>> associated AP in the scan list? If not, you might hit this. And the > >>> driver is being stupid, because of *course* the AP you're currently > >>> connected to should always be in the scan list, unless you're no longer > >>> connected to it. > >>> > >>> The code in NM grabs the SSID& BSSID on a 6 second timer, and tries to > >>> find that AP in the scan list. If it can't find it (because the driver > >>> didn't return that AP in the scan list) then it reports none. > >>> > >>> If that's your problem, it's a driver problem. > >> > >> How would I check this? Should it be obvious from "iwlist scan" ? That > >> returns > >> the current AP along with the other visible ones. > >> > >> Also, reviewing the comments in bug 291760, this problem is not just > >> isolated > >> to the ath9k driver. It's also being reported for ath5k, wl, iwl3945, > >> ipw2100, > >> rtl8187, and b43, across multiple kernel and driver revisions. As such it > >> seems unlikely to be the drivers' fault. > > > > Depends; it might show up in that scan, or it might not. If you can > > reliably get it to show up every time, that's great. But until 2.6.30, > > mac80211-based drivers would not always return the current AP. And some > > of the older drivers don't either, though fullmac drivers are more > > likely to be OK there. > > If you already know for a fact that certain drivers are incompatible with NM, > it seems you should be documenting that in your release notes or something. > Or, you should be maintaining a list of tested known-to-work drivers. > > > There is one window where NM wouldn't be able to find the AP; if NM just > > did a scan, and then the card reassociates to a different AP that's not > > in the scan list, and doesn't send an GIWSCAN event so that the AP list > > gets pulled (ipw2x00 do this, other drivers might not), then when NM > > goes to pull the BSSID off the card, the scan list doesn't contain the > > current AP. What NM should be doing here is to request that the > > supplicant grab the scan list again when it sees a BSSID it doesn't know > > about, but that's somewhat complicated. > > There must be more cases than this, because there are no other APs for my > card > to associate to. (They're all secured with WEP or WPA, and I only have > credentials for mine.) The only reason I ever see the card reassociate at all > is due to NM's scanning; with that patched out it just stays associated. > > > If the driver doesn't return the frequency of the BSSID in the scan > > results, or that frequency doesn't match what the card reports from > > SIOCGIWFREQ, then NM also can come up with (none). Check that the > > information from an 'iwlist scan' for that BSSID matches what 'iwconfig' > > reports when associated to that specific AP. > > > > So in conclusion there are actual driver bugs; (a) not all drivers scan > > results contain the currently associated AP in every scan, and (b) not > > all drivers emit scan results events when they associate to a new AP > > that's not already in the scan list, and finally (c) some drivers are > > just busted and return wrong channel information. > > Pretty sure (c) is not the case here, the info from iwlist scan and iwconfig > all matches. (b) won't happen in my current environment, so I can't test one > way or another. (a) doesn't appear to happen when I look, but I have no idea > how many scans are needed before the symptom occurs.
You can run NM with: NM_ACTIVE_AP_DEBUG=1 /usr/sbin/NetworkManager --no-daemon and get a lot more info about what's happening here and why the active AP isn't found in the scan list. Any chance you can do that and post the results? Dan > It seems to me that blaming the driver is not particularly useful unless you > can provide a procedure or script to demonstrate the driver bugs. In the > meantime, that whole spectrum of drivers is out there and people are trying > to > use them. And except for whatever NM's undocumented expectations, those cards > and drivers work fine. Since only NM causes problems, it's your > responsibility > to either help identify the problems so the driver writers can fix them, or > make NM work despite those problems. E.g., if you know that scans return > unreliable information, then *stop relying on the scan results*. Clearly the > driver can tell you if it's associated or not. Assuming that the association > is gone because the current AP doesn't show up in the current scan list, when > you know that scans can be incomplete, is stupid. > > Likewise, continual scanning seems to be counterproductive. The impact it has > on network throughput is significant: > > 64 bytes from 192.168.1.1: icmp_seq=7 ttl=64 time=1.57 ms > 64 bytes from 192.168.1.1: icmp_seq=8 ttl=64 time=1.56 ms > 64 bytes from 192.168.1.1: icmp_seq=9 ttl=64 time=4607 ms > 64 bytes from 192.168.1.1: icmp_seq=10 ttl=64 time=3604 ms > 64 bytes from 192.168.1.1: icmp_seq=11 ttl=64 time=2604 ms > 64 bytes from 192.168.1.1: icmp_seq=12 ttl=64 time=1604 ms > 64 bytes from 192.168.1.1: icmp_seq=13 ttl=64 time=604 ms > 64 bytes from 192.168.1.1: icmp_seq=14 ttl=64 time=1.54 ms > 64 bytes from 192.168.1.1: icmp_seq=15 ttl=64 time=1.54 ms > 64 bytes from 192.168.1.1: icmp_seq=16 ttl=64 time=1.54 ms > > Obviously the usual ping time is ~1.5ms; "iwlist scan" slows that down quite > a > lot. I'd rather wait 6 seconds *once* to find a new AP after I've > legitimately > lost an AP association, instead of waiting ~10 seconds every two minutes > rescanning for a list of APs that simply don't matter. > > At this point I've spent as much time as I can afford on it, and my patched > NM > works for me. > _______________________________________________ NetworkManager-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/networkmanager-list
