Thanks for the reply.

My first theory was that there must be a routing problem, but after
thinking through it, I still can't see the problem. Maybe a network diagram
would be useful. Here's a quick drawing:
https://docs.google.com/drawings/d/1jo6834EdFt3SWwzRkrY-eWhwmFIDDYTiKFM8fpgMwSY/edit?usp=sharing
(If you prefer a PNG or PDF attachment instead, let me know.)

The dnsmasq server is dev-router (top right section of the diagram). It
previously had the IP address 172.18.15.1/24. When it had that address, the
DHCP client rack7-pdu1 (bottom center) would receive the expected lease for
172.18.15.106/24 and the gateway 172.18.15.1. The change that you're
questioning (yellow highlight) was to remove 172.18.15.1 from dev-router
and add it to usb-ms01 (upper left). (This is a "stack" of three switches,
but they behave as a single, logical layer 2 switch.)

In this new config, rack7-pdu1 does receive DHCP responses from dnsmasq and
it gets a lease. It's just the *wrong* lease, one from the DHCP pool, not
the reserved IP address that we expect it to get.

What is [rack7-pdu1] going to do when it wants to send a packet? It doesn't
> have any more specific route, so it wants to send it to the default route
> of 172.18.15.1. How does it do that? It sends an ARP out of its
> one-and-only interface asking "who has [172.18.15.1]" and there will be no
> answer, because [172.18.15.1] is no longer on that network segment, it's
> been moved "upstream".


But 172.18.15.1 *is* in the same segment. It's the address of the VLAN 199
interface of usb-ms01. Hosts at the bottom of the diagram, which are
downstream from a VLAN 199 access port, can ping 172.18.15.1.

-Rich

On Wed, Oct 26, 2022 at 5:20 PM Simon Kelley <si...@thekelleys.org.uk>
wrote:

>
>
> On 25/10/2022 19:14, Rich Otero via Dnsmasq-discuss wrote:
> > We have an Ubuntu v16.04.5 server with dnsmasq v2.75. The server acts as
> > a router for approximately 140 IP subnets and dnsmasq provides DHCP and
> > DNS for those subnets. The server has two network interfaces, which are
> > basically an "upstream" interface (eno1) that has routes out of the LAN
> > and a "downstream" interface (enp2s0) that has an IP address in every
> > subnet that is managed by dnsmasq.
> >
> > First, I'll describe the configuration of the server. Most of the
> > downstream subnets are portions of 172.18.0.0/16 <http://172.18.0.0/16>.
>
> > The /16 is split into halves, 172.18.0.0/17 <http://172.18.0.0/17> and
> > 172.18.128.0/17 <http://172.18.128.0/17>. Then the lower half is split
> > into many /24s (172.18.0.0/24 <http://172.18.0.0/24>, 172.18.1.0/24
> > <http://172.18.1.0/24>, 172.18.2.0/24 <http://172.18.2.0/24>, and so
> > on). The server's downstream interface then has the ".1" address of
> > every subnet:
> >
> >     (some lines are grepped out to make this easier to read)
> >     3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc
> >     pfifo_fast state UP group default qlen 1000
> >          inet 10.139.100.1/24 <http://10.139.100.1/24> brd
> >     10.139.100.255 scope global enp2s0
> >          inet 10.139.200.1/23 <http://10.139.200.1/23> brd
> >     10.139.201.255 scope global enp2s0
> >          inet 10.43.10.1/24 <http://10.43.10.1/24> brd 10.43.10.255
> >     scope global enp2s0
> >          inet 10.43.6.1/24 <http://10.43.6.1/24> brd 10.43.6.255 scope
> >     global enp2s0
> >          inet 10.43.12.1/24 <http://10.43.12.1/24> brd 10.43.12.255
> >     scope global enp2s0
> >          inet 10.43.16.1/24 <http://10.43.16.1/24> brd 10.43.16.255
> >     scope global enp2s0
> >          inet 10.43.17.1/24 <http://10.43.17.1/24> brd 10.43.17.255
> >     scope global enp2s0
> >          inet 172.18.0.1/24 <http://172.18.0.1/24> brd 172.18.0.255
> >     scope global enp2s0
> >          inet 172.18.1.1/24 <http://172.18.1.1/24> brd 172.18.1.255
> >     scope global enp2s0
> >          inet 172.18.2.1/24 <http://172.18.2.1/24> brd 172.18.2.255
> >     scope global enp2s0
> >
> >         < snip - every /24 of the lower /17 is setup this way >
> >
> >          inet 172.18.125.1/24 <http://172.18.125.1/24> brd
> >     172.18.125.255 scope global enp2s0
> >          inet 172.18.126.1/24 <http://172.18.126.1/24> brd
> >     172.18.126.255 scope global enp2s0
> >          inet 172.18.127.1/24 <http://172.18.127.1/24> brd
> >     172.18.127.255 scope global enp2s0
> >          inet 172.18.128.1/17 <http://172.18.128.1/17> brd
> >     172.18.255.255 scope global enp2s0
> >          inet6 fe80::225:90ff:fed6:368a/64 scope link
> >
> >
> > In /etc/default/dnsmasq, we enable the daemon and set
> > CONFIG_DIR=/etc/dnsmasq.d,.dpkg-dist,.dpkg-old,.dpkg-new. The main
> > dnsmasq configuration is in this file:
> >
> >     # /etc/dnsmasq.d/dev-router
> >     local=/dev.editshare.com/ <http://dev.editshare.com/>
> >     interface=enp2s0
> >     domain=dev.editshare.com <http://dev.editshare.com>
> >     host-record=dev.editshare.com <http://dev.editshare.com
> >,176.58.116.220
> >     auth-server=dev-router.editshare.boston,eno1
> >     auth-zone=dev.editshare.com
> >     <http://dev.editshare.com>,enp2s0,176.58.116.220
> >     server=/qa-ad.dev.editshare.com/172.18.3.99
> >     <http://qa-ad.dev.editshare.com/172.18.3.99>
> >     dhcp-option=option:domain-name,"dev.editshare.com
> >     <http://dev.editshare.com> editshare.boston"
> >     dhcp-option=option:domain-search,dev.editshare.com
> >     <http://dev.editshare.com>,editshare.boston
> >     dhcp-hostsdir=/etc/dhcp-hosts
> >     dhcp-optsdir=/etc/dhcp-opts
> >     hostsdir=/etc/static-hosts
> >     expand-hosts
> >
> >
> > And then we put additional configuration (dhcp-hosts, dhcp-range, and so
> > on) into separate files per subnet or supernet. For example, we can
> > examine the 172.18.15.0/24 <http://172.18.15.0/24> subnet:
> >
> >     # /etc/dnsmasq.d/172.18.0.0-16
> >     dhcp-range=172.18.135.0,172.18.255.255,255.255.128.0
> >     dhcp-range=172.18.0.0,static,255.255.255.0
> >     dhcp-range=172.18.1.0,static,255.255.255.0
> >     dhcp-range=172.18.2.0,static,255.255.255.0
> >
> >         < snip - every /24 in this range is setup this way >
> >
> >     dhcp-range=172.18.14.0,static,255.255.255.0
> >     dhcp-range=172.18.15.0,static,255.255.255.0
> >     dhcp-range=172.18.16.0,static,255.255.255.0
> >
> >         < snip - every /24 in this range is setup this way >
> >
> >     dhcp-range=172.18.125.0,static,255.255.255.0
> >     dhcp-range=172.18.126.0,static,255.255.255.0
> >     dhcp-range=172.18.127.0,static,255.255.255.0
> >
> >
> >     (some dhcp-hostsare omitted here to make this easier to read)
> >
> >     # /etc/dhcp-hosts/172.18.15.0-24
> >     00:c0:b7:f1:0f:65,rack7-pdu1
> >     00:c0:b7:f1:a3:71,rack7-pdu2
> >
> >
> >     (some static-hostsare omitted here to make this easier to read)
> >
> >     # /etc/static-hosts/172.18.15.0-24
> >     172.18.15.106 rack7-pdu1
> >     172.18.15.107 rack7-pdu2
> >
> >
> > (From this point, I'll refer to 172.18.15.0/24 <http://172.18.15.0/24>
> > as "the 15 subnet.")
> >
> > With the above configuration in place, when rack7-pdu1 is connected to
> > the network, it is given the IP address 172.18.15.106/24
> > <http://172.18.15.106/24>, the default gateway address 172.18.15.1, and
> > the DNS server address 172.18.15.1. That's the normal behavior that we
> > expect from this configuration, which has been in place for a few years.
> >
> > Now I'm introducing changes to that config: We need to decommission this
> > server as a router and as a DHCP and DNS server, and those services will
> > be migrated to other servers. The first step of our migration workflow
> > is to move the default gateway addresses to another router in the
> > network while continuing to use dnsmasq on the current server for DHCP
> > and DNS. The 15 subnet contains relatively few hosts and is not
> > sensitive to disruptions, so I am testing the changes for only that
> > subnet until we are satisfied that this process works. I removed
> > 172.18.15.1/24 <http://172.18.15.1/24> from enp2s0and added it to an
> > interface of a router upstream. After doing that, we could no longer
> > reach rack7-pdu1 at 172.18.15.106/24 <http://172.18.15.106/24>. We
> > suspected that the reason could be that the client wasn't being given a
> > default gateway by the DHCP server because the server was no longer
> > directly attached to the 15 subnet, so we tried using dhcp-optionto
> > force including option:routerin the DHCP response. We tried this four
> > different ways but could not produce the desired outcome:
> >
> > #1: set the tag for a dhcp-range, apply the tag to dhcp-option
> >
> >     # /etc/dnsmasq.d/172.18.0.0-16
> >     dhcp-range=set:172.18.15.0-24,172.18.15.0,static,255.255.255.0
> >     dhcp-option=tag:172.18.15.0-24,option:router,172.18.15.1
> >
> >
> > #2: set the tag for one dhcp-host, apply the tag to dhcp-range and
> dhcp-opts
> >
> >     # /etc/dnsmasq.d/172.18.0.0-16
> >     dhcp-range=tag:test,172.18.15.0,static,255.255.255.0
> >     # /etc/dhcp-hosts/172.18.15.0-24
> >     00:c0:b7:f1:0f:65,set:test,rack7-pdu1
> >     # /etc/dhcp-opts/172.18.15.0-24
> >     tag:test,option:router,172.18.15.1
> >     # /etc/static-hosts/172.18.15.0-24
> >     172.18.15.106 rack7-pdu1
> >
> >
> > #3: set the tag for a dhcp-range, apply the tag to dhcp-range and
> dhcp-opts
> >
> >     # /etc/dnsmasq.d/172.18.0.0-16
> >     dhcp-range=tag:test,set:test,172.18.15.0,static,255.255.255.0
> >     # /etc/dhcp-hosts/172.18.15.0-24
> >     00:c0:b7:f1:0f:65,rack7-pdu1
> >     # /etc/dhcp-opts/172.18.15.0-24
> >     tag:test,option:router,172.18.15.1
> >     # /etc/static-hosts/172.18.15.0-24
> >     172.18.15.106 rack7-pdu1
> >
> >
> > #4: set the tag for one dhcp-host, apply the tag to dhcp-opts
> >
> >     # /etc/dnsmasq.d/172.18.0.0-16
> >     dhcp-range=172.18.15.0,static,255.255.255.0
> >     # /etc/dhcp-hosts/172.18.15.0-24
> >     00:c0:b7:f1:0f:65,set:test,rack7-pdu1
> >     # /etc/dhcp-opts/172.18.15.0-24
> >     tag:test,option:router,172.18.15.1
> >     # /etc/static-hosts/172.18.15.0-24
> >     172.18.15.106 rack7-pdu1
> >
> >
> > Before each test, I used dhcp_releaseto revoke the client's existing
> > lease. As we watched the dnsmasq.leasesfile, we observed the lease being
> > removed and then approximately halfway through the lease period, we
> > observed dnsmasq give a new lease to the client with an IP address from
> > our "catch-all" IP address pool, between 172.18.135.0 and 172.18.255.255
> > instead of giving it 172.18.15.106 as expected. When we checked the log,
> > we saw that the 15 subnet was not being logged as an "available DHCP
> > subnet:"
> >
> >     < snip - every /24 between 172.18.18.0/24 <http://172.18.18.0/24>
> >     and 172.18.127.0/24 <http://172.18.127.0/24> was listed before this
> >
> >     Oct 19 16:36:48 dnsmasq-dhcp[26972]: 993790843 available DHCP
> >     subnet: 172.18.17.0/255.255.255.0 <http://172.18.17.0/255.255.255.0>
> >     Oct 19 16:36:48 dnsmasq-dhcp[26972]: 993790843 available DHCP
> >     subnet: 172.18.16.0/255.255.255.0 <http://172.18.16.0/255.255.255.0>
> >     Oct 19 16:36:48 dnsmasq-dhcp[26972]: 993790843 available DHCP
> >     subnet: 172.18.14.0/255.255.255.0 <http://172.18.14.0/255.255.255.0>
> >     Oct 19 16:36:48 dnsmasq-dhcp[26972]: 993790843 available DHCP
> >     subnet: 172.18.13.0/255.255.255.0 <http://172.18.13.0/255.255.255.0>
> >     < snip - every /24 between 172.18.12.0/24 <http://172.18.12.0/24>
> >     and 172.18.0.0/24 <http://172.18.0.0/24> was listed after this >
> >
> >
> > Again we suspected that this must be due to the server not being
> > connected to 172.18.15.0/24 <http://172.18.15.0/24>. We tried adding
> > 172.18.15.254/24 <http://172.18.15.254/24> to enp2s0along with
> > configuration #3, but the outcome was unchanged.
> >
> > We kept reading the docs and searching for advice, and we found the
> > shared-networkoption that was added in v2.81. According to the docs,
> > this seems like it could solve our problem. Since we are using a
> > relatively old version of Ubuntu and we can't upgrade it at this time,
> > we downloaded the source for dnsmasq v2.87, compiled it on the server
> > (with the only modification being COPTS=’-DHAVE_DBUS -DHAVE_DNSSEC’),
> > and replaced the v2.75 binary with the v2.87 binary. We tested both
> > shared-network syntaxes independently:
> >
> >     # first attempt: <interface>,<network-address>
> >     shared-network=enp2s0,172.18.15.0
> >
> >     # second attempt: <relay-address>,<network-address>
> >     shared-network=172.18.128.1,172.18.15.0
> >
> >
> > But the outcome was unchanged in both cases: The lease given to
> > rack7-pdu1 was not for 172.18.15.106. It was an address from the DHCP
> > pool in 172.18.128.0/17 <http://172.18.128.0/17>.
> >
> > I have also tried adding the IP address to the dhcp-hosts config like so:
> >
> > 00:c0:b7:f1:0f:65,set:test,172.18.15.106,rack7-pdu1
> >
> > But that also had no effect.
> >
> > At this point, I'm out of ideas. There must be something in my
> > configuration that isn't correct, but I can't figure out what it is. The
> > configuration syntax test always passes unless I've made an obvious
> > typo. Can anyone offer some help, please?
> >
>
>
> This looks like it might be a routing problem. The weasel words are "I
> removed 172.18.15.1/24 from enp2s0 and added it to an
> interface of a router upstream."
>
>
> Now, you have a host which might, or might not, get an address on
> 172.18.15.1/24 and a default route of 172.18.15.1. Let's assume you've
> got the shared=network incantations right and it does. What is it going
> to do when it wants to send a packet? It doesn't have any more specific
> route, so it wants to send it to the default route of 172.18.15.1. How
> does it do that? It sends an ARP out of its one-and-only interface
> asking "who has 192.168.15.1" and there will be no answer, because
> 192.168.15.1 is no longer on that network segment, it's been moved
> "upstream". A default route is only meaningful if it's on the same
> subnet as its owner.
>
> I think you need a different migration strategy.
>
>
> Simon.
>
> > -Rich
> >
> > _______________________________________________
> > Dnsmasq-discuss mailing list
> > Dnsmasq-discuss@lists.thekelleys.org.uk
> > https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>
_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

Reply via email to