On Thu, 2009-02-19 at 20:37 +0800, cee1 wrote: > Hi, Dan > I've reviewed the code: when starts a new dhcp process (on the > same net interface), the previous one will be terminated. So the there > should always be one dhcp process for the same net interface, am I > right?
Correct. > But in my test (still with tag NM-0.7.0 & dhclient-3.1.0), > switching between APs, some log message shows NM can't get pid from > pidfile (the pidfile doesn't exist), but the *previous* dhcp process > is actully alive . This is really strange. Yeah, it's strange. Are you running something like SELinux or AppArmor? Dan > 2009/2/19 cee1 <[email protected]> > Hi, Dan > Thanks for the replay. > The code of NetworkManger is downloaded from gnome's ftp > server with version 0.7.0. > I've tried this NM-0.7 with both dhcpcd-4.0.10 and > dhclient-3.1.0. > > Following your advice, I add some log output code, and find > the device->state in function > <nm_dhcp_manager_begin_transaction> is DHC_NBI (0) or DHC_END > (15). > > In the case of dhcpcd-4.0.10, I see some log messages like: > "wlan0: dhcpcd already running on pid 1626 > (/var/run/dhcpcd-wlan0.pid)", and NM forgets some dhcpcd > processes. > Also I see log messages like "dhcpcd[1630]: write signal 15: > Bad file descriptor", does this suggests that the dhcpcd > process can't write the pid file?? > > In the case of dhclient-3.1.0, (in my testing period), no > <dhcp_watch_cb> called (doesn't like the case of > dhcpcd-4.0.10), but still, log messages like > "nm_dhcp_manager_handle_event(): Received DHCP event from > unexpected PID 19924 (expected 0)" appear. > I see only one log message like "dhclient: There is already a > pid file /usr/var/run/dhclient-wlan0.pid with pid 19911". > This time, NM forgets dhclient processes more easily. When I > switch connections between APs, the number of dhclient > processes increases to 4 or 5 in a short time (which doesn't > like the case of dhcpcd-4.0.10). > > Do these suggest both of the two dhcp clients above having > problems? > Is the pidfile > reliable?(<nm_dhcp_manager_cancel_transaction_real> uses > device->pid to kill the dhcp process, but <nm_dhcp_client> > uses pidfile) > > BTW, thanks for your work, I will try the code with the tag > 0.7.1rc1 later. > > 2009/2/19 Dan Williams <[email protected]> > > > On Wed, 2009-02-18 at 17:10 +0800, cee1 wrote: > > Hi, > > I'm using NetwokManager-0.7(using dhcpcd) with > wifi, and see a > > lot of messages like "Received DHCP event from > unexpected PID xxxx > > (expected xxxx)" in the syslog. > > > What version of NM specifically? The final NM release > from late > November 2008, or a snapshot from some other point in > time? > > > I also see more than one dhcpcd processes > running. Sometimes, > > only one dhcpcd process lefts at last, but sometimes > not. (searching > > google, find the same problem someone has reported: > > > > https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/255829) > > Following the source code(NM-0.7), in the > function > > <nm_dhcp_client_start> (nm-dhcp-dhcpcd.c), it kills > the previous > > dhcpcd process and saves the new one's pid in > device->pid. But the > > function <dhcp_watch_cb>, knowing the termination of > the previous > > dhcpd process, sets device->pid to zero. So the > current dhcp process's > > pid is lost, and <nm_dhcp_manager_handle_event> will > print log like > > "Received DHCP event from unexpected PID 8812 > (expected 0)". > > > Obviously the previous one isn't getting cleaned up > correctly. It might > be that dhcpcd returns a state NM doesn't expect, and > thus isn't getting > cleaned up in nm_dhcp_manager_begin_transaction(): > > if (state_is_bound (device->state) || > (device->state == DHC_START)) { > /* Cancel any DHCP transaction already > in progress */ > nm_dhcp_manager_cancel_transaction_real > (device); > } > > Basically, it should *always* be the case that a > previous dhcp client > instance is cleaned up before starting a new one, so > the watch callback > should certainly be removed before starting a new dhcp > client run. > > > Also the previous dhcpcd process's time_out > handler may expire, > > and affects the current running dhcpcd process? > > > The timeout gets cleared whenever DHCP options get > returned, or when the > dhcp device gets destroyed, or when > nm_dhcp_manager_cancel_transaction() > is called to clean up any ongoing DHCP transaction. > cancel_transaction() gets called when the device is > deactivated. > > In any case, I've fixed up a few issues with the DHCP > client handling > code and made it somewhat more robust against this > issue. If you can > possibly test the latest git, or at least figure out > what state the DHCP > client is left in (device->state) when > nm_dhcp_manager_begin_transaction() gets called, that > would be great. > > Dan > > > It seems the function > <supplicant_interface_acquire> > > (nm-device-wifi.c) registers a function > > <supplicant_iface_connection_state_cb> as the > handler of the signal > > "connection-state" . > > And: > > <supplicant_iface_connection_state_cb> shcedules > > <supplicant_iface_connection_state_cb_handler>, > which calls > > > <nm_device_activate_schedule_stage3_ip_config_start>, > which shcedules > > <nm_device_activate_stage3_ip_config_start> , which > calls > > <real_act_stage3_ip_config_start>, which calls > > <nm_dhcp_manager_begin_transaction>. > > So it is possiable that more than one dhcpcd > processes running > > meanwhile. > > > > To fix this, shoud I do: > > 1) Protect the corresponding data struct > > or > > 2) Add a clean step when a second dhcpcd process to > be started > > > > Any ideas? Thanks. > > > _______________________________________________ > > NetworkManager-list mailing list > > [email protected] > > > http://mail.gnome.org/mailman/listinfo/networkmanager-list > > > _______________________________________________ NetworkManager-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/networkmanager-list
