Hi, Dan
Thanks for the replay.
The code of NetworkManger is downloaded from gnome's ftp server with version
0.7.0.
I've tried this NM-0.7 with both dhcpcd-4.0.10 and dhclient-3.1.0.

Following your advice, I add some log output code, and find the
device->state in function <nm_dhcp_manager_begin_transaction> is DHC_NBI (0)
or DHC_END (15).

In the case of dhcpcd-4.0.10, I see some log messages like: "wlan0: dhcpcd
already running on pid 1626 (/var/run/dhcpcd-wlan0.pid)", and NM forgets
some dhcpcd processes.
Also I see log messages like "dhcpcd[1630]: write signal 15: Bad file
descriptor", does this suggests that the dhcpcd process can't write the pid
file??

In the case of dhclient-3.1.0, (in my testing period), no <dhcp_watch_cb>
called (doesn't like the case of dhcpcd-4.0.10), but still, log messages
like "nm_dhcp_manager_handle_event(): Received DHCP event from unexpected
PID 19924 (expected 0)" appear.
I see only one log message like "dhclient: There is already a pid file
/usr/var/run/dhclient-wlan0.pid with pid 19911".
This time, NM forgets dhclient processes more easily. When I switch
connections between APs, the number of dhclient processes increases to 4 or
5 in a short time (which doesn't like the case of dhcpcd-4.0.10).

Do these suggest both of the two dhcp clients above having problems?
Is the pidfile reliable?(<nm_dhcp_manager_cancel_transaction_real> uses
device->pid to kill the dhcp process, but <nm_dhcp_client> uses pidfile)

BTW, thanks for your work, I will try the code with the tag 0.7.1rc1 later.

2009/2/19 Dan Williams <[email protected]>

> On Wed, 2009-02-18 at 17:10 +0800, cee1 wrote:
> > Hi,
> >      I'm using NetwokManager-0.7(using dhcpcd) with wifi, and see a
> > lot of messages like  "Received DHCP event from unexpected PID xxxx
> > (expected xxxx)" in the syslog.
>
> What version of NM specifically?  The final NM release from late
> November 2008, or a snapshot from some other point in time?
>
> >      I also see more than one dhcpcd processes running. Sometimes,
> > only one dhcpcd process lefts at last, but sometimes not. (searching
> > google, find the same problem someone has reported:
> > https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/255829)
> >      Following the source code(NM-0.7), in the function
> > <nm_dhcp_client_start> (nm-dhcp-dhcpcd.c), it kills the previous
> > dhcpcd process and saves the new one's pid in device->pid. But the
> > function <dhcp_watch_cb>, knowing the termination of the previous
> > dhcpd process, sets device->pid to zero. So the current dhcp process's
> > pid is lost, and <nm_dhcp_manager_handle_event> will print log like
> > "Received DHCP event from unexpected PID 8812 (expected 0)".
>
> Obviously the previous one isn't getting cleaned up correctly.  It might
> be that dhcpcd returns a state NM doesn't expect, and thus isn't getting
> cleaned up in nm_dhcp_manager_begin_transaction():
>
>        if (state_is_bound (device->state) || (device->state == DHC_START))
> {
>                /* Cancel any DHCP transaction already in progress */
>                nm_dhcp_manager_cancel_transaction_real (device);
>        }
>
> Basically, it should *always* be the case that a previous dhcp client
> instance is cleaned up before starting a new one, so the watch callback
> should certainly be removed before starting a new dhcp client run.
>
> >      Also the previous dhcpcd process's time_out handler may expire,
> > and affects the current running dhcpcd process?
>
> The timeout gets cleared whenever DHCP options get returned, or when the
> dhcp device gets destroyed, or when nm_dhcp_manager_cancel_transaction()
> is called to clean up any ongoing DHCP transaction.
> cancel_transaction() gets called when the device is deactivated.
>
> In any case, I've fixed up a few issues with the DHCP client handling
> code and made it somewhat more robust against this issue.  If you can
> possibly test the latest git, or at least figure out what state the DHCP
> client is left in (device->state) when
> nm_dhcp_manager_begin_transaction() gets called, that would be great.
>
> Dan
>
> >      It seems the function <supplicant_interface_acquire>
> > (nm-device-wifi.c) registers a function
> > <supplicant_iface_connection_state_cb> as the handler of the signal
> > "connection-state" .
> > And:
> > <supplicant_iface_connection_state_cb> shcedules
> > <supplicant_iface_connection_state_cb_handler>, which calls
> > <nm_device_activate_schedule_stage3_ip_config_start>, which shcedules
> > <nm_device_activate_stage3_ip_config_start> , which calls
> > <real_act_stage3_ip_config_start>, which calls
> > <nm_dhcp_manager_begin_transaction>.
> > So it is possiable that more than one dhcpcd processes running
> > meanwhile.
> >
> >     To fix this, shoud I do:
> > 1) Protect the corresponding data struct
> > or
> > 2) Add a clean step when a second dhcpcd process to be started
> >
> >      Any ideas? Thanks.
> > _______________________________________________
> > NetworkManager-list mailing list
> > [email protected]
> > http://mail.gnome.org/mailman/listinfo/networkmanager-list
>
>
_______________________________________________
NetworkManager-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/networkmanager-list

Reply via email to