On 2 July 2015 at 07:57, Ted Unangst <[email protected]> wrote:
> this has been an ongoing problem, but I think it's gotten worse.
>
> When I change networks, I run dhclient again. It tells me I have a lease. For
> instance:
> DHCPREQUEST on em0 to 255.255.255.255
> DHCPREQUEST on em0 to 255.255.255.255
> DHCPDISCOVER on em0 - interval 3
> DHCPOFFER from 192.168.1.1 (48:f8:b3:05:5e:09)
> DHCPREQUEST on em0 to 255.255.255.255
> DHCPACK from 192.168.1.1 (48:f8:b3:05:5e:09)
> bound to 192.168.1.137 -- renewal in 43200 seconds.
>
> But then it doesn't actually assign the IP and I have no routes:
> Destination        Gateway            Flags   Refs      Use   Mtu  Prio Iface
> 127/8              127.0.0.1          UGRS       0        0 32768     8 lo0
> 127.0.0.1          127.0.0.1          UHl        1    26262 32768     1 lo0
> 224/4              127.0.0.1          URS        0        0 32768     8 lo0
>
> So I have to run it again.
> DHCPREQUEST on em0 to 255.255.255.255
> DHCPREQUEST on em0 to 255.255.255.255
> DHCPACK from 192.168.1.1 (48:f8:b3:05:5e:09)
> bound to 192.168.1.137 -- renewal in 43200 seconds.
>
> Now I have an IP and routes:
> default            192.168.1.1        UGS        0        0     -     8 em0
> ...
>
> There appears to be a race where dhclient changes my IP address, then decides
> to delete the old address, but actually deletes the new address.
>

Best guess based on this info:

It gets the ACK, deletes the old address and routes, and then adds the
new address and routes. It gets the routing message reporting the
expected address has been added, emits "bound to ...", and then gets
another routing message that tells it somebody is messing with the
interface and decides to exit.

If you turn out to be the first person able to reproduce this often
enough and willing to run some of the many diagnostic diffs I have
cast upon the waters for past reports of this issue I would be
delighted to make another attempt to find and avoid the race.

To begin, some of

1) /var/log/daemon entries during an event.
2) /etc/dhclient.conf
3) /var/db/dhclient* files before and after switching networks
4) ifconfig and netstat before and after switching networks
5) tcpdump -i em0 -vv -X -s 2000 host 102.168.1.1
6) define 'gotten worse' -- % of 'failures'?
7) dmesg is always nice
8) running 'dhclient -L <path>' to record the actual leases offered
9) output of 'route -n monitor' during dhclient run on new network
10) compile dhclient with #define DEBUG turned on in dhcpd.h, and run
'dhclient -d' on new network
11) what if any M's are in the kernel you are running, especially any
of the recent network stack ones
12 ) a detailed description of how you are changing networks,
especially the timing between leaving one and joining the other and
the timing between running the various dhclient instances
13) 'pgrep -l -f dhclient' before and after each run of dhclient

Be warned, given current dhclient architecture and routing message
production there are very likely unsolvable races involved. In
particular the fact that routing messages do not contain the PID of
the program causing the issuance of the ADD and DELETE routing message
makes it theoretically impossible to guarantee the correct thing is
done with the routing messages. Each dhclient instance must guess if
the routing message is one caused by itself or a competing instance.

Another question is why do you run dhclient again? The link up/down
(and I assume there is a link up/down pair of routing messages) should
trigger a lease renewal.

.... Ken

Reply via email to