On Fri, 2009-07-10 at 20:32 -0300, Ricardo Salveti de Araujo wrote: > Hi all, > > We're currently using Network Manager at Mamona, a developer distro > based on open embedded for Nokia tablet devices. > > Currently we're using NM 0.7 and we just created the packages for NM > 0.7.1, but while testing it, I faced some race condition issues with > the usbnet.
Ha! I've been looking for that race off and on for a while. Thanks for finding the root cause. Can you try out this commit from master? If it works for you I'll also cherry-pick to 0.7.x. commit 302c9fcbccf3ad945afbc3f58e42013045c6e352 Author: Dan Williams <[email protected]> Date: Mon Jul 13 19:40:39 2009 -0400 netlink: fix race that caused stale carrier state signals Found by Ricardo Salveti de Araujo <[email protected]> The link cache was updated immediately, but the carrier state signals were emitted a lot later, when the cache data was already stale. So just update the cache at the same time we emit the signals. The carrier-state-request stuff wasn't originally converted to deferred for any netlink-specific reason, just to smooth the initial device creation process in NM. Thanks! Dan > Here's the log of the Network Manager, while booting the device (with > additional debug): > <info> starting... > <info> nm_netlink_monitor_open_connection() > <info> nm_netlink_monitor_request_status() <- add the handler to the main > loop > <info> deferred_emit_carrier_state() <- consume the cache > <info> netlink_object_message_handler() (lo) IFF_LOWER_UP > <info> netlink_event_input() > <info> netlink_object_message_handler() (lo) IFF_LOWER_DOWN > <info> netlink_object_message_handler() (usb0) IFF_LOWER_DOWN > <info> nm_netlink_monitor_request_status() <- add the handler to the main > loop > <info> (usb0): new Ethernet device (driver: 'ehci_udc') > <info> (usb0): exported as /org/freedesktop/Hal/devices/net_7a_ce_13_55_f7_81 > <info> Trying to start the supplicant... > <info> netlink_event_input() > <info> (usb0): device state change: 1 -> 2 > <info> (usb0): bringing up device. > <info> (usb0): preparing device. > <info> (usb0): deactivating device (reason: 2). > <info> Setting system hostname to 'localhost.localdomain' (no default device) > <info> netlink_object_message_handler() (usb0) IFF_LOWER_UP > <info> (usb0): carrier now ON (device state 2) > <info> (usb0): device state change: 2 -> 3 > <info> netlink_event_input() > <info> Trying to start the system settings daemon... > <info> deferred_emit_carrier_state() <- consume the cache > <info> netlink_object_message_handler() (lo) IFF_LOWER_UP > <info> netlink_object_message_handler() (usb0) IFF_LOWER_DOWN > <info> (usb0): carrier now OFF (device state 3) > <info> (usb0): device state change: 3 -> 2 > <info> (usb0): deactivating device (reason: 40). > > The problem is that at the end the device carrier status is OFF, while > it should be ON, so the NM could finish setting the IP address and > letting it ready to use. > > Looking forward to try to identify where is the problem, I found that > the function deferred_emit_carrier_state (nm-netlink-monitor.c) is > taking more than expected to be called, and between the > nl_cache_refill and the actual message handler, NM brought the device > up, setting the carrier status to ON. At the moment > deferred_emit_carrier_state is called by the main loop, the cache data > is not valid anymore, letting the usb0 carrier status to OFF again. > > Because of this behavior, NM is not configuring the device as it > should, and the interface remains up while without any IP. > > This is not happening every time though. When NM brings up the device > after calling deferred_emit_carrier_state, everything works fine, so > that's why it seems that a racing condition is going on. > > The question is, what's the best way to fix this issue? > > I know that I could go to 2 directions, one is to check the cached > data when getting a new event (like bringing up the usb0 interface), > and the other is to call nl_cache_refill inside > deferred_emit_carrier_state, changing a little bit the current > behavior. > > As I still don't understand a lot of the NM code (started reading it > deeply today), I would like to know on what solution should I work on, > so I could send you the patch later. > > Thanks! _______________________________________________ NetworkManager-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/networkmanager-list
