Thanks for the reply. My first theory was that there must be a routing problem, but after thinking through it, I still can't see the problem. Maybe a network diagram would be useful. Here's a quick drawing: https://docs.google.com/drawings/d/1jo6834EdFt3SWwzRkrY-eWhwmFIDDYTiKFM8fpgMwSY/edit?usp=sharing (If you prefer a PNG or PDF attachment instead, let me know.)
The dnsmasq server is dev-router (top right section of the diagram). It previously had the IP address 172.18.15.1/24. When it had that address, the DHCP client rack7-pdu1 (bottom center) would receive the expected lease for 172.18.15.106/24 and the gateway 172.18.15.1. The change that you're questioning (yellow highlight) was to remove 172.18.15.1 from dev-router and add it to usb-ms01 (upper left). (This is a "stack" of three switches, but they behave as a single, logical layer 2 switch.) In this new config, rack7-pdu1 does receive DHCP responses from dnsmasq and it gets a lease. It's just the *wrong* lease, one from the DHCP pool, not the reserved IP address that we expect it to get. What is [rack7-pdu1] going to do when it wants to send a packet? It doesn't > have any more specific route, so it wants to send it to the default route > of 172.18.15.1. How does it do that? It sends an ARP out of its > one-and-only interface asking "who has [172.18.15.1]" and there will be no > answer, because [172.18.15.1] is no longer on that network segment, it's > been moved "upstream". But 172.18.15.1 *is* in the same segment. It's the address of the VLAN 199 interface of usb-ms01. Hosts at the bottom of the diagram, which are downstream from a VLAN 199 access port, can ping 172.18.15.1. -Rich On Wed, Oct 26, 2022 at 5:20 PM Simon Kelley <si...@thekelleys.org.uk> wrote: > > > On 25/10/2022 19:14, Rich Otero via Dnsmasq-discuss wrote: > > We have an Ubuntu v16.04.5 server with dnsmasq v2.75. The server acts as > > a router for approximately 140 IP subnets and dnsmasq provides DHCP and > > DNS for those subnets. The server has two network interfaces, which are > > basically an "upstream" interface (eno1) that has routes out of the LAN > > and a "downstream" interface (enp2s0) that has an IP address in every > > subnet that is managed by dnsmasq. > > > > First, I'll describe the configuration of the server. Most of the > > downstream subnets are portions of 172.18.0.0/16 <http://172.18.0.0/16>. > > > The /16 is split into halves, 172.18.0.0/17 <http://172.18.0.0/17> and > > 172.18.128.0/17 <http://172.18.128.0/17>. Then the lower half is split > > into many /24s (172.18.0.0/24 <http://172.18.0.0/24>, 172.18.1.0/24 > > <http://172.18.1.0/24>, 172.18.2.0/24 <http://172.18.2.0/24>, and so > > on). The server's downstream interface then has the ".1" address of > > every subnet: > > > > (some lines are grepped out to make this easier to read) > > 3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc > > pfifo_fast state UP group default qlen 1000 > > inet 10.139.100.1/24 <http://10.139.100.1/24> brd > > 10.139.100.255 scope global enp2s0 > > inet 10.139.200.1/23 <http://10.139.200.1/23> brd > > 10.139.201.255 scope global enp2s0 > > inet 10.43.10.1/24 <http://10.43.10.1/24> brd 10.43.10.255 > > scope global enp2s0 > > inet 10.43.6.1/24 <http://10.43.6.1/24> brd 10.43.6.255 scope > > global enp2s0 > > inet 10.43.12.1/24 <http://10.43.12.1/24> brd 10.43.12.255 > > scope global enp2s0 > > inet 10.43.16.1/24 <http://10.43.16.1/24> brd 10.43.16.255 > > scope global enp2s0 > > inet 10.43.17.1/24 <http://10.43.17.1/24> brd 10.43.17.255 > > scope global enp2s0 > > inet 172.18.0.1/24 <http://172.18.0.1/24> brd 172.18.0.255 > > scope global enp2s0 > > inet 172.18.1.1/24 <http://172.18.1.1/24> brd 172.18.1.255 > > scope global enp2s0 > > inet 172.18.2.1/24 <http://172.18.2.1/24> brd 172.18.2.255 > > scope global enp2s0 > > > > < snip - every /24 of the lower /17 is setup this way > > > > > inet 172.18.125.1/24 <http://172.18.125.1/24> brd > > 172.18.125.255 scope global enp2s0 > > inet 172.18.126.1/24 <http://172.18.126.1/24> brd > > 172.18.126.255 scope global enp2s0 > > inet 172.18.127.1/24 <http://172.18.127.1/24> brd > > 172.18.127.255 scope global enp2s0 > > inet 172.18.128.1/17 <http://172.18.128.1/17> brd > > 172.18.255.255 scope global enp2s0 > > inet6 fe80::225:90ff:fed6:368a/64 scope link > > > > > > In /etc/default/dnsmasq, we enable the daemon and set > > CONFIG_DIR=/etc/dnsmasq.d,.dpkg-dist,.dpkg-old,.dpkg-new. The main > > dnsmasq configuration is in this file: > > > > # /etc/dnsmasq.d/dev-router > > local=/dev.editshare.com/ <http://dev.editshare.com/> > > interface=enp2s0 > > domain=dev.editshare.com <http://dev.editshare.com> > > host-record=dev.editshare.com <http://dev.editshare.com > >,176.58.116.220 > > auth-server=dev-router.editshare.boston,eno1 > > auth-zone=dev.editshare.com > > <http://dev.editshare.com>,enp2s0,176.58.116.220 > > server=/qa-ad.dev.editshare.com/172.18.3.99 > > <http://qa-ad.dev.editshare.com/172.18.3.99> > > dhcp-option=option:domain-name,"dev.editshare.com > > <http://dev.editshare.com> editshare.boston" > > dhcp-option=option:domain-search,dev.editshare.com > > <http://dev.editshare.com>,editshare.boston > > dhcp-hostsdir=/etc/dhcp-hosts > > dhcp-optsdir=/etc/dhcp-opts > > hostsdir=/etc/static-hosts > > expand-hosts > > > > > > And then we put additional configuration (dhcp-hosts, dhcp-range, and so > > on) into separate files per subnet or supernet. For example, we can > > examine the 172.18.15.0/24 <http://172.18.15.0/24> subnet: > > > > # /etc/dnsmasq.d/172.18.0.0-16 > > dhcp-range=172.18.135.0,172.18.255.255,255.255.128.0 > > dhcp-range=172.18.0.0,static,255.255.255.0 > > dhcp-range=172.18.1.0,static,255.255.255.0 > > dhcp-range=172.18.2.0,static,255.255.255.0 > > > > < snip - every /24 in this range is setup this way > > > > > dhcp-range=172.18.14.0,static,255.255.255.0 > > dhcp-range=172.18.15.0,static,255.255.255.0 > > dhcp-range=172.18.16.0,static,255.255.255.0 > > > > < snip - every /24 in this range is setup this way > > > > > dhcp-range=172.18.125.0,static,255.255.255.0 > > dhcp-range=172.18.126.0,static,255.255.255.0 > > dhcp-range=172.18.127.0,static,255.255.255.0 > > > > > > (some dhcp-hostsare omitted here to make this easier to read) > > > > # /etc/dhcp-hosts/172.18.15.0-24 > > 00:c0:b7:f1:0f:65,rack7-pdu1 > > 00:c0:b7:f1:a3:71,rack7-pdu2 > > > > > > (some static-hostsare omitted here to make this easier to read) > > > > # /etc/static-hosts/172.18.15.0-24 > > 172.18.15.106 rack7-pdu1 > > 172.18.15.107 rack7-pdu2 > > > > > > (From this point, I'll refer to 172.18.15.0/24 <http://172.18.15.0/24> > > as "the 15 subnet.") > > > > With the above configuration in place, when rack7-pdu1 is connected to > > the network, it is given the IP address 172.18.15.106/24 > > <http://172.18.15.106/24>, the default gateway address 172.18.15.1, and > > the DNS server address 172.18.15.1. That's the normal behavior that we > > expect from this configuration, which has been in place for a few years. > > > > Now I'm introducing changes to that config: We need to decommission this > > server as a router and as a DHCP and DNS server, and those services will > > be migrated to other servers. The first step of our migration workflow > > is to move the default gateway addresses to another router in the > > network while continuing to use dnsmasq on the current server for DHCP > > and DNS. The 15 subnet contains relatively few hosts and is not > > sensitive to disruptions, so I am testing the changes for only that > > subnet until we are satisfied that this process works. I removed > > 172.18.15.1/24 <http://172.18.15.1/24> from enp2s0and added it to an > > interface of a router upstream. After doing that, we could no longer > > reach rack7-pdu1 at 172.18.15.106/24 <http://172.18.15.106/24>. We > > suspected that the reason could be that the client wasn't being given a > > default gateway by the DHCP server because the server was no longer > > directly attached to the 15 subnet, so we tried using dhcp-optionto > > force including option:routerin the DHCP response. We tried this four > > different ways but could not produce the desired outcome: > > > > #1: set the tag for a dhcp-range, apply the tag to dhcp-option > > > > # /etc/dnsmasq.d/172.18.0.0-16 > > dhcp-range=set:172.18.15.0-24,172.18.15.0,static,255.255.255.0 > > dhcp-option=tag:172.18.15.0-24,option:router,172.18.15.1 > > > > > > #2: set the tag for one dhcp-host, apply the tag to dhcp-range and > dhcp-opts > > > > # /etc/dnsmasq.d/172.18.0.0-16 > > dhcp-range=tag:test,172.18.15.0,static,255.255.255.0 > > # /etc/dhcp-hosts/172.18.15.0-24 > > 00:c0:b7:f1:0f:65,set:test,rack7-pdu1 > > # /etc/dhcp-opts/172.18.15.0-24 > > tag:test,option:router,172.18.15.1 > > # /etc/static-hosts/172.18.15.0-24 > > 172.18.15.106 rack7-pdu1 > > > > > > #3: set the tag for a dhcp-range, apply the tag to dhcp-range and > dhcp-opts > > > > # /etc/dnsmasq.d/172.18.0.0-16 > > dhcp-range=tag:test,set:test,172.18.15.0,static,255.255.255.0 > > # /etc/dhcp-hosts/172.18.15.0-24 > > 00:c0:b7:f1:0f:65,rack7-pdu1 > > # /etc/dhcp-opts/172.18.15.0-24 > > tag:test,option:router,172.18.15.1 > > # /etc/static-hosts/172.18.15.0-24 > > 172.18.15.106 rack7-pdu1 > > > > > > #4: set the tag for one dhcp-host, apply the tag to dhcp-opts > > > > # /etc/dnsmasq.d/172.18.0.0-16 > > dhcp-range=172.18.15.0,static,255.255.255.0 > > # /etc/dhcp-hosts/172.18.15.0-24 > > 00:c0:b7:f1:0f:65,set:test,rack7-pdu1 > > # /etc/dhcp-opts/172.18.15.0-24 > > tag:test,option:router,172.18.15.1 > > # /etc/static-hosts/172.18.15.0-24 > > 172.18.15.106 rack7-pdu1 > > > > > > Before each test, I used dhcp_releaseto revoke the client's existing > > lease. As we watched the dnsmasq.leasesfile, we observed the lease being > > removed and then approximately halfway through the lease period, we > > observed dnsmasq give a new lease to the client with an IP address from > > our "catch-all" IP address pool, between 172.18.135.0 and 172.18.255.255 > > instead of giving it 172.18.15.106 as expected. When we checked the log, > > we saw that the 15 subnet was not being logged as an "available DHCP > > subnet:" > > > > < snip - every /24 between 172.18.18.0/24 <http://172.18.18.0/24> > > and 172.18.127.0/24 <http://172.18.127.0/24> was listed before this > > > > Oct 19 16:36:48 dnsmasq-dhcp[26972]: 993790843 available DHCP > > subnet: 172.18.17.0/255.255.255.0 <http://172.18.17.0/255.255.255.0> > > Oct 19 16:36:48 dnsmasq-dhcp[26972]: 993790843 available DHCP > > subnet: 172.18.16.0/255.255.255.0 <http://172.18.16.0/255.255.255.0> > > Oct 19 16:36:48 dnsmasq-dhcp[26972]: 993790843 available DHCP > > subnet: 172.18.14.0/255.255.255.0 <http://172.18.14.0/255.255.255.0> > > Oct 19 16:36:48 dnsmasq-dhcp[26972]: 993790843 available DHCP > > subnet: 172.18.13.0/255.255.255.0 <http://172.18.13.0/255.255.255.0> > > < snip - every /24 between 172.18.12.0/24 <http://172.18.12.0/24> > > and 172.18.0.0/24 <http://172.18.0.0/24> was listed after this > > > > > > > Again we suspected that this must be due to the server not being > > connected to 172.18.15.0/24 <http://172.18.15.0/24>. We tried adding > > 172.18.15.254/24 <http://172.18.15.254/24> to enp2s0along with > > configuration #3, but the outcome was unchanged. > > > > We kept reading the docs and searching for advice, and we found the > > shared-networkoption that was added in v2.81. According to the docs, > > this seems like it could solve our problem. Since we are using a > > relatively old version of Ubuntu and we can't upgrade it at this time, > > we downloaded the source for dnsmasq v2.87, compiled it on the server > > (with the only modification being COPTS=’-DHAVE_DBUS -DHAVE_DNSSEC’), > > and replaced the v2.75 binary with the v2.87 binary. We tested both > > shared-network syntaxes independently: > > > > # first attempt: <interface>,<network-address> > > shared-network=enp2s0,172.18.15.0 > > > > # second attempt: <relay-address>,<network-address> > > shared-network=172.18.128.1,172.18.15.0 > > > > > > But the outcome was unchanged in both cases: The lease given to > > rack7-pdu1 was not for 172.18.15.106. It was an address from the DHCP > > pool in 172.18.128.0/17 <http://172.18.128.0/17>. > > > > I have also tried adding the IP address to the dhcp-hosts config like so: > > > > 00:c0:b7:f1:0f:65,set:test,172.18.15.106,rack7-pdu1 > > > > But that also had no effect. > > > > At this point, I'm out of ideas. There must be something in my > > configuration that isn't correct, but I can't figure out what it is. The > > configuration syntax test always passes unless I've made an obvious > > typo. Can anyone offer some help, please? > > > > > This looks like it might be a routing problem. The weasel words are "I > removed 172.18.15.1/24 from enp2s0 and added it to an > interface of a router upstream." > > > Now, you have a host which might, or might not, get an address on > 172.18.15.1/24 and a default route of 172.18.15.1. Let's assume you've > got the shared=network incantations right and it does. What is it going > to do when it wants to send a packet? It doesn't have any more specific > route, so it wants to send it to the default route of 172.18.15.1. How > does it do that? It sends an ARP out of its one-and-only interface > asking "who has 192.168.15.1" and there will be no answer, because > 192.168.15.1 is no longer on that network segment, it's been moved > "upstream". A default route is only meaningful if it's on the same > subnet as its owner. > > I think you need a different migration strategy. > > > Simon. > > > -Rich > > > > _______________________________________________ > > Dnsmasq-discuss mailing list > > Dnsmasq-discuss@lists.thekelleys.org.uk > > https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss > > _______________________________________________ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss >
_______________________________________________ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss