Hi Frank, On 11/23/2017 12:05 AM, Frank Even wrote: > Seems despite sending as text, alignment is an issue. If needing a > better view of the data, I've tossed it in a gist as well: > https://gist.github.com/dfjkl/1b45f83f8b0fd427191a8d63a0e6aaa5
Thank you for your detailed report! This is indeed a bug, the last added server was always sorted as if it had an order of 0. I just opened a new pull request [1] to fix this issue, it should be fixed in master soon. We provide packages at [2] if you want to give master a try once the issue has been fixed. [1]: https://github.com/PowerDNS/pdns/pull/6043 [2]: https://repo.powerdns.com Best regards, Remi > On Wed, Nov 22, 2017 at 3:51 PM, Frank Even > <lists+powerdns....@elitists.org> wrote: >> To Whomever May Be Concerned, >> >> In testing dnsdist (version 1.2.0) out on a new system configured with >> the ServerPolicy(firstAvailable), we noticed what seems like a pretty >> big bug. We've got a lot of nodes servicing anycast addresses, >> converting from named listening on those addresses to just listening >> on the local addresses and then letting dnsdist handle listening on >> the anycast addresses. In this case, we've got a group of 24 servers >> configured as backends to dnsdist in geographically diverse areas in >> an ordered config serving DNS requests from >> localhost/localcluster/remote systems. On a local node, my test was >> running a "dig +short @anycastaddr google.com" in a loop. What we end >> up seeing is that when we kill named on the local system, queries jump >> to the last system in the ordered list. It does not matter what >> system is there or how latent it is (we tried changing up the >> configuration to different systems), or the order number configured >> (these were tested at 100, 90, and now 9 just to ensure it wasn't an >> error in sorting numbers). IF we set the last system in the list to >> administratively DOWN, then the ordering works as expected. When the >> final server in the list is put back in service, queries jump back to >> the very last system in the list until the local named instance is >> brought back up and then queries return there. Some data below >> demonstrating this: >> >> # Queries going to localhost, first host in the ordered list. >> >>> showServers() >> # Name Address State Qps >> Qlim Ord Wt Queries Drops Drate Lat Outstanding Pools >> 0 127.0.0.1:53 up 1.0 >> 0 0 1 68 0 0.0 0.6 0 >> 1 10.3.5.13:53 up 0.0 >> 0 5 1 0 0 0.0 0.0 0 >> 2 10.3.5.14:53 up 0.0 >> 0 5 1 0 0 0.0 0.0 0 >> 3 10.6.3.1:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 4 10.6.3.2:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 5 10.6.3.3:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 6 10.6.3.65:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 7 10.6.3.66:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 8 10.6.3.67:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 9 10.3.8.27:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 10 10.3.8.47:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 11 10.2.7.15:53 down 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 12 10.2.7.16:53 down 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 13 10.2.7.17:53 down 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 14 10.2.7.18:53 down 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 15 10.2.7.19:53 down 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 16 10.2.7.20:53 down 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 17 10.8.3.2:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 18 10.8.3.65:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 19 10.8.3.66:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 20 10.4.3.1:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 21 10.4.3.2:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 22 10.4.3.66:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 23 10.4.3.65:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 24 10.8.3.1:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> All 0.0 >> 68 0 >> >> # Dropping local named instance >> >> ~]# service named stop ; dnsdist -c >> Redirecting to /bin/systemctl stop named.service >>> showServers() >> # Name Address State Qps >> Qlim Ord Wt Queries Drops Drate Lat Outstanding Pools >> 0 127.0.0.1:53 up 1.1 >> 0 0 1 106 0 0.0 0.5 1 >> 1 10.3.5.13:53 up 0.0 >> 0 5 1 0 0 0.0 0.0 0 >> 2 10.3.5.14:53 up 0.0 >> 0 5 1 0 0 0.0 0.0 0 >> 3 10.6.3.1:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> <snip> >> 22 10.4.3.66:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 23 10.4.3.65:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 24 10.8.3.1:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> All 1.0 >> 106 0 >> >> # dnsdist drops localhost and local system IP for this system out of >> rotation. >> # NOTE - queries are now diverted to the last node in the list. This >> system is ordered higher than node 1, still up and receiving >> # requests happily. It's also of course less latent since it's one >> hop away. Yet, we're crossing an ocean here for resolution. >> >>> showServers() >> # Name Address State Qps >> Qlim Ord Wt Queries Drops Drate Lat Outstanding Pools >> 0 127.0.0.1:53 down 0.0 >> 0 0 1 107 2 0.0 0.5 0 >> 1 10.3.5.13:53 up 0.0 >> 0 5 1 0 0 0.0 0.0 0 >> 2 10.3.5.14:53 down 0.0 >> 0 5 1 0 0 0.0 0.0 0 >> 3 10.6.3.1:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> <snip> >> 22 10.4.3.66:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 23 10.4.3.65:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 24 10.8.3.1:53 up 0.8 >> 0 9 1 20 0 0.0 24.7 0 >> All 0.0 >> 127 2 >> >> # Forcing down the last server in the list >> >>> getServer(24):setDown() >>> showServers() >> # Name Address State Qps >> Qlim Ord Wt Queries Drops Drate Lat Outstanding Pools >> 0 127.0.0.1:53 down 0.0 >> 0 0 1 107 2 0.0 0.5 0 >> 1 10.3.5.13:53 up 0.0 >> 0 5 1 2 0 0.0 0.0 0 >> 2 10.3.5.14:53 down 0.0 >> 0 5 1 0 0 0.0 0.0 0 >> 3 10.6.3.1:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> <snip> >> 22 10.4.3.66:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 23 10.4.3.65:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 24 10.8.3.1:53 DOWN 0.8 >> 0 9 1 34 0 0.0 39.8 0 >> All 0.0 >> 143 2 >> >> # Traffic shifts to the next lowest ordered system (#1), as it should. >> >>> showServers() >> # Name Address State Qps >> Qlim Ord Wt Queries Drops Drate Lat Outstanding Pools >> 0 127.0.0.1:53 down 0.0 >> 0 0 1 107 2 0.0 0.5 0 >> 1 10.3.5.13:53 up 1.0 >> 0 5 1 71 0 0.0 0.6 0 >> 2 10.3.5.14:53 down 0.0 >> 0 5 1 0 0 0.0 0.0 0 >> 3 10.6.3.1:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> <snip> >> 22 10.4.3.66:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 23 10.4.3.65:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 24 10.8.3.1:53 DOWN 0.0 >> 0 9 1 34 0 0.0 39.8 0 >> All 0.0 >> 212 2 >> >> # Putting last system in list (#24) back in active state, and dnsdist >> starts sending traffic to it again?! >> >>> getServer(24):setAuto() >>> showServers() >> # Name Address State Qps >> Qlim Ord Wt Queries Drops Drate Lat Outstanding Pools >> 0 127.0.0.1:53 down 0.0 >> 0 0 1 107 2 0.0 0.5 0 >> 1 10.3.5.13:53 up 1.1 >> 0 5 1 86 0 0.0 0.5 0 >> 2 10.3.5.14:53 down 0.0 >> 0 5 1 0 0 0.0 0.0 0 >> 3 10.6.3.1:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> <snip> >> 22 10.4.3.66:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 23 10.4.3.65:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 24 10.8.3.1:53 up 0.0 >> 0 9 1 36 0 0.0 41.8 0 >> All 1.0 >> 229 2 >> >> # ...and traffic keeps getting sent to it, despite high latency and >> higher numerical order in the active systems list. >> >>> showServers() >> # Name Address State Qps >> Qlim Ord Wt Queries Drops Drate Lat Outstanding Pools >> 0 127.0.0.1:53 down 0.0 >> 0 0 1 107 2 0.0 0.5 0 >> 1 10.3.5.13:53 up 0.0 >> 0 5 1 86 0 0.0 0.5 0 >> 2 10.3.5.14:53 down 0.0 >> 0 5 1 0 0 0.0 0.0 0 >> 3 10.6.3.1:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> <snip> >> 24 10.8.3.1:53 up 0.8 >> 0 9 1 172 0 0.0 125.9 0 >> All 0.0 >> 365 2 >> >> # If I add a dummy entry at the end of the list w/ a higher priority, >> things work as they're supposed to (although, I'm not >> # completely convinced it's taking latency in consideration when it >> fails over to all the same weighted systems, it seems to >> # jump towards the end of that list regardless of latency). >> >>> showServers() >> # Name Address State Qps >> Qlim Ord Wt Queries Drops Drate Lat Outstanding Pools >> 0 127.0.0.1:53 down 0.0 >> 0 0 1 35 1 0.0 1.4 0 >> 1 10.3.5.13:53 down 0.0 >> 0 5 1 31 2 0.0 0.6 0 >> 2 10.3.5.14:53 down 0.0 >> 0 5 1 0 0 0.0 0.0 0 >> 3 10.6.3.1:53 DOWN 0.0 >> 0 6 1 32 0 0.0 0.4 0 >> 4 10.3.8.27:53 up 1.0 >> 0 7 1 239 0 0.0 0.4 0 >> <snip> >> 21 10.4.3.2:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 22 10.4.3.66:53 up 0.0 >> 0 9 1 0 0 0.0 0.0 0 >> 23 10.4.3.65:53 up 0.0 >> 0 9 1 431 0 0.0 148.6 0 >> 24 10.8.3.1:53 up 0.0 >> 0 19 1 0 0 0.0 0.0 0 >> 25 127.0.0.255:53 down 0.0 >> 0 99 1 0 0 0.0 0.0 0 >> All 0.0 >> 768 3 > _______________________________________________ > dnsdist mailing list > dnsdist@mailman.powerdns.com > https://mailman.powerdns.com/mailman/listinfo/dnsdist > -- Remi Gacogne PowerDNS.COM BV - https://www.powerdns.com/
signature.asc
Description: OpenPGP digital signature
_______________________________________________ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist