Send dhcp-users mailing list submissions to dhcp-users@lists.isc.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.isc.org/mailman/listinfo/dhcp-users or, via email, send a message with subject or body 'help' to dhcp-users-requ...@lists.isc.org You can reach the person managing the list at dhcp-users-ow...@lists.isc.org When replying, please edit your Subject line so it is more specific than "Re: Contents of dhcp-users digest..." Today's Topics: 1. Re: Question (Leslie Rhorer) 2. Re: Question (Leslie Rhorer) ---------------------------------------------------------------------- Message: 1 Date: Fri, 3 Jun 2022 04:03:00 -0500 From: Leslie Rhorer <lesrho...@siliconventures.net> To: dhcp-users@lists.isc.org Subject: Re: Question Message-ID: <f26c27e5-88f3-0f28-5417-776d0e913...@siliconventures.net> Content-Type: text/plain; charset="utf-8"; Format="flowed" ??? Well, I found one error left over from when this was a /24 network.? The range definition on the secondary server was from 192.168.1.220 to 192.168.1.240, instead of 192.168.0.200 to 192.168.0.240.? You can see the error in the backup.conf.gz file. I am not sure what issues this would cause, other than of course serving addresses in a range I want to change. On 6/3/2022 2:45 AM, Glenn Satchell wrote: > Hi Leslie, > > Ok I can see a packet flow in that pcap file between the two servers. > It shows a TCP packet from 192.168.1.50 port 46869 with the SYN [S] > flag to 192.168.1.51 port 647 - so that's trying to open the connection. > 192.168.1.51 responds with RST [R] flag, so 192.168.50 tries again, > and on it goes. So looks like 192.168.51 is not listening on that > port. There's no failover connection being established. So we have > that to sort out first. > > $ tcpdump -r secondary.pcap -v > reading from file secondary.pcap, link-type EN10MB (Ethernet) > 16:23:34.924575 IP (tos 0x0, ttl 64, id 46213, offset 0, flags [DF], > proto TCP (6), length 60) > ??? 192.168.1.50.46869 > 192.168.1.51.647: Flags [S], cksum 0xdfce > (correct), seq 4009562500, win 64240, options [mss 1460,sackOK,TS val > 3809692760 ecr 0,nop,wscale 7], length 0 > 16:23:34.924599 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto > TCP (6), length 40) > ??? 192.168.1.51.647 > 192.168.1.50.46869: Flags [R.], cksum 0x71fb > (correct), seq 0, ack 4009562501, win 0, length 0 > 16:23:39.925032 IP (tos 0x0, ttl 64, id 20478, offset 0, flags [DF], > proto TCP (6), length 60) > ??? 192.168.1.50.57529 > 192.168.1.51.647: Flags [S], cksum 0x995f > (correct), seq 2790876011, win 64240, options [mss 1460,sackOK,TS val > 3809697760 ecr 0,nop,wscale 7], length 0 > 16:23:39.925054 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto > TCP (6), length 40) > ??? 192.168.1.51.647 > 192.168.1.50.57529: Flags [R.], cksum 0x3f14 > (correct), seq 0, ack 2790876012, win 0, length 0 > > When I look at it with wireshark it's the same but perhaps shown a > little more clearly > > 1??? 0.000000??? 192.168.1.50??? 192.168.1.51??? TCP??? 74 46869 ? 647 > [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=3809692760 > TSecr=0 WS=128 > 2??? 0.000024??? 192.168.1.51??? 192.168.1.50??? TCP??? 54??? 647 ? > 46869 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0 > 3??? 5.000457??? 192.168.1.50??? 192.168.1.51??? TCP??? 74 57529 ? 647 > [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=3809697760 > TSecr=0 WS=128 > 4??? 5.000479??? 192.168.1.51??? 192.168.1.50??? TCP??? 54??? 647 ? > 57529 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0 > 5??? 10.000924??? 192.168.1.50??? 192.168.1.51??? TCP??? 74 51935 ? > 647 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=3809702760 > TSecr=0 WS=128 > 6??? 10.000945??? 192.168.1.51??? 192.168.1.50??? TCP??? 54??? 647 ? > 51935 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0 > 7??? 15.001390??? 192.168.1.50??? 192.168.1.51??? TCP??? 74 57497 ? > 647 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=3809707761 > TSecr=0 WS=128 > > Can you please post the failover peer definitions for both dhcp > servers, I think we need to check that they make sense. Second the > interface configs for that interface on each server, output from "ip > addr show ethX" or whatever the correct interface name is please. We > need to be sure the address, netmask, etc, match up. > > So that packet capture is very useful. It's pin pointed an issue > straight away. > > regards, > Glenn > > On 2022-06-03 16:37, Leslie Rhorer wrote: >> ??? I am seeing a listening connection on the primary server on 647, >> but nothing on the secondary.? I have included the tcdump from the >> secondary on port 647 as a gz file.? 'Still waiting on the dumps on >> ports 67 and 68 (it's taking a while for 100 packets to pass) >> >> On 6/3/2022 1:03 AM, Glenn Satchell wrote: >>> Hi Leslie, >>> >>> I know about capturing packets on a 10G interface :) many gigabytes >>> in a few seconds... >>> >>> So you need to use filters when capturing, eg with tcpdump >>> >>> ? tcpdump -i eth0 host <other dhcp server IP or name> and tcp port 647 >>> >>> will only capture the failover traffic on eth0 directed to or from >>> the other server, and ignore the rest. >>> >>> ? tcpdump udp and port 68 or port 67 >>> >>> will capture dhcp packets. >>> >>> You can add options like "-c 100" to stop after 100 packets are >>> captured. "-w filename" will capture to a file and you can copy this >>> file to your desktop and use wireshark to read it. >>> >>> With failover, it's better to restart one dhcp server, wait for it >>> to sync, then restart the other one. If you shut down both and then >>> start them, then they come up in recover mode. >>> >>> Also looking at failover connections: >>> >>> ? netstat -ant | grep 647 >>> >>> should show an established connection between the two servers. >>> >>> regards, >>> Glenn >>> >>> On 2022-06-03 15:39, Leslie Rhorer wrote: >>> >>>> On 6/2/2022 11:30 PM, Gregory Sloop wrote: >>>> >>>>> Are you seeing balance messages every hour as the two re-balance >>>>> the available lease pool? >>>> No, I don't think so.? It has only been a couple of hours since I >>>> have had both online, however. >>>> >>>>> You say they are both handling leases properly, but how do you >>>>> know this? (That a machine gets a lease from somewhere is not good >>>>> evidence.) >>>> >>>> Do you mean because some other machine / device could be issuing >>>> leases?? No.? In that case, >>>> >>>> 1. Killing both servers would not take down any DHCP clients. If >>>> both servers are shut down, DHCP clients start failing in about an >>>> hour, until they are all dead. >>>> >>>> 2. DHCP responses on the LAN stop completely the moment both >>>> servers are taken down. >>>> >>>> 3. No other machine would know anything about the list of >>>> dynamically assigned fixed IP addresses in dhcpd.static. None of >>>> the addresses of any of the clients ever change. >>>> >>>> 4. Whenever one server is shut down, the other responds with tons >>>> of responses in? the log. >>>> >>>>> A packet capture in front of the secondary might be helpful to see >>>>> what traffic is passing - both to the peer and to clients. >>>> While not impossible, that is a bit easier said than done. The >>>> links between the servers are 10G.? I can look into it. >>>> >>>>> (I hate making captures, at least as much as the next person, but >>>>> dang if they don't, nearly always, show something that was >>>>> different than I assumed. So, I've just gotten a lot less averse >>>>> to getting captures. Yeah, they'll probably take me extra time to >>>>> setup and get and paw through, [all when I could be fixin' stuff!] >>>>> but they can save hours or days of fruitless searching for a fix, >>>>> when I don't even really *know* what's wrong yet. Don't know about >>>>> anyone else, but fixing problems gets a whole lot easier when I >>>>> actually know what's wrong, or at least have a good idea what's >>>>> going on. :) >>>> >>>> Agreed, although when an interface is chunking away at over 10,000 >>>> packets per second... >>>> >>>> If something doesn't break loose, I will see about loading Wireshark. -------------- next part -------------- 3: enp6s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 08:62:66:a1:40:93 brd ff:ff:ff:ff:ff:ff inet 192.168.1.50/23 brd 192.168.1.255 scope global noprefixroute enp6s0 valid_lft forever preferred_lft forever inet6 fe80::a62:66ff:fea1:4093/64 scope link valid_lft forever preferred_lft forever -------------- next part -------------- 2: enp11s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 50:46:5d:65:15:9c brd ff:ff:ff:ff:ff:ff inet 192.168.1.51/23 brd 192.168.1.255 scope global noprefixroute enp11s0 valid_lft forever preferred_lft forever inet6 fe80::5246:5dff:fe65:159c/64 scope link noprefixroute valid_lft forever preferred_lft forever -------------- next part -------------- A non-text attachment was scrubbed... Name: backup.conf.gz Type: application/gzip Size: 1719 bytes Desc: not available URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20220603/6563d6dc/attachment-0002.gz> -------------- next part -------------- A non-text attachment was scrubbed... Name: primary.conf.gz Type: application/gzip Size: 1718 bytes Desc: not available URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20220603/6563d6dc/attachment-0003.gz> ------------------------------ Message: 2 Date: Fri, 3 Jun 2022 04:33:23 -0500 From: Leslie Rhorer <lesrho...@siliconventures.net> To: dhcp-users@lists.isc.org Subject: Re: Question Message-ID: <1504bf5d-b61a-31e3-14b5-f1cf860e7...@siliconventures.net> Content-Type: text/plain; charset=UTF-8; format=flowed ??? Oi, veh!? Something else has died, now, and I don't know what or how.? The only change I made was the one listed below, and now dhcpd won't run from /etc/init.d/isc-dhcp-server.? Rather, it runs, but then it quits.? I can run it manually from the CL with exactly the same syntax, and it remains up, but then the primary server quits. On 6/3/2022 4:03 AM, Leslie Rhorer wrote: > Well, I found one error left over from when this was a /24 network.? > The range definition on the secondary server was from 192.168.1.220 to > 192.168.1.240, instead of 192.168.0.200 to 192.168.0.240.? You can see > the error in the backup.conf.gz file. I am not sure what issues this > would cause, other than of course serving addresses in a range I want > to change. > > On 6/3/2022 2:45 AM, Glenn Satchell wrote: >> Hi Leslie, >> >> Ok I can see a packet flow in that pcap file between the two servers. >> It shows a TCP packet from 192.168.1.50 port 46869 with the SYN [S] >> flag to 192.168.1.51 port 647 - so that's trying to open the connection. >> 192.168.1.51 responds with RST [R] flag, so 192.168.50 tries again, >> and on it goes. So looks like 192.168.51 is not listening on that >> port. There's no failover connection being established. So we have >> that to sort out first. >> >> $ tcpdump -r secondary.pcap -v >> reading from file secondary.pcap, link-type EN10MB (Ethernet) >> 16:23:34.924575 IP (tos 0x0, ttl 64, id 46213, offset 0, flags [DF], >> proto TCP (6), length 60) >> ??? 192.168.1.50.46869 > 192.168.1.51.647: Flags [S], cksum 0xdfce >> (correct), seq 4009562500, win 64240, options [mss 1460,sackOK,TS val >> 3809692760 ecr 0,nop,wscale 7], length 0 >> 16:23:34.924599 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], >> proto TCP (6), length 40) >> ??? 192.168.1.51.647 > 192.168.1.50.46869: Flags [R.], cksum 0x71fb >> (correct), seq 0, ack 4009562501, win 0, length 0 >> 16:23:39.925032 IP (tos 0x0, ttl 64, id 20478, offset 0, flags [DF], >> proto TCP (6), length 60) >> ??? 192.168.1.50.57529 > 192.168.1.51.647: Flags [S], cksum 0x995f >> (correct), seq 2790876011, win 64240, options [mss 1460,sackOK,TS val >> 3809697760 ecr 0,nop,wscale 7], length 0 >> 16:23:39.925054 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], >> proto TCP (6), length 40) >> ??? 192.168.1.51.647 > 192.168.1.50.57529: Flags [R.], cksum 0x3f14 >> (correct), seq 0, ack 2790876012, win 0, length 0 >> >> When I look at it with wireshark it's the same but perhaps shown a >> little more clearly >> >> 1??? 0.000000??? 192.168.1.50??? 192.168.1.51??? TCP??? 74 46869 ? >> 647 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=3809692760 >> TSecr=0 WS=128 >> 2??? 0.000024??? 192.168.1.51??? 192.168.1.50??? TCP??? 54 647 ? >> 46869 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0 >> 3??? 5.000457??? 192.168.1.50??? 192.168.1.51??? TCP??? 74 57529 ? >> 647 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=3809697760 >> TSecr=0 WS=128 >> 4??? 5.000479??? 192.168.1.51??? 192.168.1.50??? TCP??? 54 647 ? >> 57529 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0 >> 5??? 10.000924??? 192.168.1.50??? 192.168.1.51??? TCP??? 74 51935 ? >> 647 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=3809702760 >> TSecr=0 WS=128 >> 6??? 10.000945??? 192.168.1.51??? 192.168.1.50??? TCP??? 54 647 ? >> 51935 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0 >> 7??? 15.001390??? 192.168.1.50??? 192.168.1.51??? TCP??? 74 57497 ? >> 647 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=3809707761 >> TSecr=0 WS=128 >> >> Can you please post the failover peer definitions for both dhcp >> servers, I think we need to check that they make sense. Second the >> interface configs for that interface on each server, output from "ip >> addr show ethX" or whatever the correct interface name is please. We >> need to be sure the address, netmask, etc, match up. >> >> So that packet capture is very useful. It's pin pointed an issue >> straight away. >> >> regards, >> Glenn >> >> On 2022-06-03 16:37, Leslie Rhorer wrote: >>> ??? I am seeing a listening connection on the primary server on 647, >>> but nothing on the secondary.? I have included the tcdump from the >>> secondary on port 647 as a gz file.? 'Still waiting on the dumps on >>> ports 67 and 68 (it's taking a while for 100 packets to pass) >>> >>> On 6/3/2022 1:03 AM, Glenn Satchell wrote: >>>> Hi Leslie, >>>> >>>> I know about capturing packets on a 10G interface :) many gigabytes >>>> in a few seconds... >>>> >>>> So you need to use filters when capturing, eg with tcpdump >>>> >>>> ? tcpdump -i eth0 host <other dhcp server IP or name> and tcp port 647 >>>> >>>> will only capture the failover traffic on eth0 directed to or from >>>> the other server, and ignore the rest. >>>> >>>> ? tcpdump udp and port 68 or port 67 >>>> >>>> will capture dhcp packets. >>>> >>>> You can add options like "-c 100" to stop after 100 packets are >>>> captured. "-w filename" will capture to a file and you can copy >>>> this file to your desktop and use wireshark to read it. >>>> >>>> With failover, it's better to restart one dhcp server, wait for it >>>> to sync, then restart the other one. If you shut down both and then >>>> start them, then they come up in recover mode. >>>> >>>> Also looking at failover connections: >>>> >>>> ? netstat -ant | grep 647 >>>> >>>> should show an established connection between the two servers. >>>> >>>> regards, >>>> Glenn >>>> >>>> On 2022-06-03 15:39, Leslie Rhorer wrote: >>>> >>>>> On 6/2/2022 11:30 PM, Gregory Sloop wrote: >>>>> >>>>>> Are you seeing balance messages every hour as the two re-balance >>>>>> the available lease pool? >>>>> No, I don't think so.? It has only been a couple of hours since I >>>>> have had both online, however. >>>>> >>>>>> You say they are both handling leases properly, but how do you >>>>>> know this? (That a machine gets a lease from somewhere is not >>>>>> good evidence.) >>>>> >>>>> Do you mean because some other machine / device could be issuing >>>>> leases?? No.? In that case, >>>>> >>>>> 1. Killing both servers would not take down any DHCP clients. If >>>>> both servers are shut down, DHCP clients start failing in about an >>>>> hour, until they are all dead. >>>>> >>>>> 2. DHCP responses on the LAN stop completely the moment both >>>>> servers are taken down. >>>>> >>>>> 3. No other machine would know anything about the list of >>>>> dynamically assigned fixed IP addresses in dhcpd.static. None of >>>>> the addresses of any of the clients ever change. >>>>> >>>>> 4. Whenever one server is shut down, the other responds with tons >>>>> of responses in? the log. >>>>> >>>>>> A packet capture in front of the secondary might be helpful to >>>>>> see what traffic is passing - both to the peer and to clients. >>>>> While not impossible, that is a bit easier said than done. The >>>>> links between the servers are 10G.? I can look into it. >>>>> >>>>>> (I hate making captures, at least as much as the next person, but >>>>>> dang if they don't, nearly always, show something that was >>>>>> different than I assumed. So, I've just gotten a lot less averse >>>>>> to getting captures. Yeah, they'll probably take me extra time to >>>>>> setup and get and paw through, [all when I could be fixin' >>>>>> stuff!] but they can save hours or days of fruitless searching >>>>>> for a fix, when I don't even really *know* what's wrong yet. >>>>>> Don't know about anyone else, but fixing problems gets a whole >>>>>> lot easier when I actually know what's wrong, or at least have a >>>>>> good idea what's going on. :) >>>>> >>>>> Agreed, although when an interface is chunking away at over 10,000 >>>>> packets per second... >>>>> >>>>> If something doesn't break loose, I will see about loading Wireshark. > ------------------------------ Subject: Digest Footer _______________________________________________ ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. dhcp-users mailing list dhcp-users@lists.isc.org https://lists.isc.org/mailman/listinfo/dhcp-users ------------------------------ End of dhcp-users Digest, Vol 164, Issue 9 ******************************************