Hi, Dan
     I've reviewed  the code: when starts a new dhcp process (on the same
net interface), the previous one will be terminated. So the there should
always be one dhcp process for the same net interface, am I right?
     But in my test (still with tag NM-0.7.0 & dhclient-3.1.0), switching
between APs, some log message shows NM can't get pid from pidfile (the
pidfile doesn't  exist), but the *previous* dhcp process is actully alive .
This is really strange.

2009/2/19 cee1 <[email protected]>

> Hi, Dan
> Thanks for the replay.
> The code of NetworkManger is downloaded from gnome's ftp server with
> version 0.7.0.
> I've tried this NM-0.7 with both dhcpcd-4.0.10 and dhclient-3.1.0.
>
> Following your advice, I add some log output code, and find the
> device->state in function <nm_dhcp_manager_begin_transaction> is DHC_NBI (0)
> or DHC_END (15).
>
> In the case of dhcpcd-4.0.10, I see some log messages like: "wlan0: dhcpcd
> already running on pid 1626 (/var/run/dhcpcd-wlan0.pid)", and NM forgets
> some dhcpcd processes.
> Also I see log messages like "dhcpcd[1630]: write signal 15: Bad file
> descriptor", does this suggests that the dhcpcd process can't write the pid
> file??
>
> In the case of dhclient-3.1.0, (in my testing period), no <dhcp_watch_cb>
> called (doesn't like the case of dhcpcd-4.0.10), but still, log messages
> like "nm_dhcp_manager_handle_event(): Received DHCP event from unexpected
> PID 19924 (expected 0)" appear.
> I see only one log message like "dhclient: There is already a pid file
> /usr/var/run/dhclient-wlan0.pid with pid 19911".
> This time, NM forgets dhclient processes more easily. When I switch
> connections between APs, the number of dhclient processes increases to 4 or
> 5 in a short time (which doesn't like the case of dhcpcd-4.0.10).
>
> Do these suggest both of the two dhcp clients above having problems?
> Is the pidfile reliable?(<nm_dhcp_manager_cancel_transaction_real> uses
> device->pid to kill the dhcp process, but <nm_dhcp_client> uses pidfile)
>
> BTW, thanks for your work, I will try the code with the tag 0.7.1rc1 later.
>
> 2009/2/19 Dan Williams <[email protected]>
>
> On Wed, 2009-02-18 at 17:10 +0800, cee1 wrote:
>> > Hi,
>> >      I'm using NetwokManager-0.7(using dhcpcd) with wifi, and see a
>> > lot of messages like  "Received DHCP event from unexpected PID xxxx
>> > (expected xxxx)" in the syslog.
>>
>> What version of NM specifically?  The final NM release from late
>> November 2008, or a snapshot from some other point in time?
>>
>> >      I also see more than one dhcpcd processes running. Sometimes,
>> > only one dhcpcd process lefts at last, but sometimes not. (searching
>> > google, find the same problem someone has reported:
>> > https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/255829)
>> >      Following the source code(NM-0.7), in the function
>> > <nm_dhcp_client_start> (nm-dhcp-dhcpcd.c), it kills the previous
>> > dhcpcd process and saves the new one's pid in device->pid. But the
>> > function <dhcp_watch_cb>, knowing the termination of the previous
>> > dhcpd process, sets device->pid to zero. So the current dhcp process's
>> > pid is lost, and <nm_dhcp_manager_handle_event> will print log like
>> > "Received DHCP event from unexpected PID 8812 (expected 0)".
>>
>> Obviously the previous one isn't getting cleaned up correctly.  It might
>> be that dhcpcd returns a state NM doesn't expect, and thus isn't getting
>> cleaned up in nm_dhcp_manager_begin_transaction():
>>
>>        if (state_is_bound (device->state) || (device->state == DHC_START))
>> {
>>                /* Cancel any DHCP transaction already in progress */
>>                nm_dhcp_manager_cancel_transaction_real (device);
>>        }
>>
>> Basically, it should *always* be the case that a previous dhcp client
>> instance is cleaned up before starting a new one, so the watch callback
>> should certainly be removed before starting a new dhcp client run.
>>
>> >      Also the previous dhcpcd process's time_out handler may expire,
>> > and affects the current running dhcpcd process?
>>
>> The timeout gets cleared whenever DHCP options get returned, or when the
>> dhcp device gets destroyed, or when nm_dhcp_manager_cancel_transaction()
>> is called to clean up any ongoing DHCP transaction.
>> cancel_transaction() gets called when the device is deactivated.
>>
>> In any case, I've fixed up a few issues with the DHCP client handling
>> code and made it somewhat more robust against this issue.  If you can
>> possibly test the latest git, or at least figure out what state the DHCP
>> client is left in (device->state) when
>> nm_dhcp_manager_begin_transaction() gets called, that would be great.
>>
>> Dan
>>
>> >      It seems the function <supplicant_interface_acquire>
>> > (nm-device-wifi.c) registers a function
>> > <supplicant_iface_connection_state_cb> as the handler of the signal
>> > "connection-state" .
>> > And:
>> > <supplicant_iface_connection_state_cb> shcedules
>> > <supplicant_iface_connection_state_cb_handler>, which calls
>> > <nm_device_activate_schedule_stage3_ip_config_start>, which shcedules
>> > <nm_device_activate_stage3_ip_config_start> , which calls
>> > <real_act_stage3_ip_config_start>, which calls
>> > <nm_dhcp_manager_begin_transaction>.
>> > So it is possiable that more than one dhcpcd processes running
>> > meanwhile.
>> >
>> >     To fix this, shoud I do:
>> > 1) Protect the corresponding data struct
>> > or
>> > 2) Add a clean step when a second dhcpcd process to be started
>> >
>> >      Any ideas? Thanks.
>> > _______________________________________________
>> > NetworkManager-list mailing list
>> > [email protected]
>> > http://mail.gnome.org/mailman/listinfo/networkmanager-list
>>
>>
>
_______________________________________________
NetworkManager-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/networkmanager-list

Reply via email to