We do the same with Quagga or BIRD on Linux and ospf daemon for georedundancy 
and load sharing with proximity for customer access to recursive bind resolvers.

avoiding tedious specific need in our case, we have the primary/secondary DNS 
IPs announced as loopback by the system.
We don't have any specific monitoring to bring OSFP down on the server since we 
have lots of them (4 per POP) and specific scripted and human monitoring 24x7, 
so if a server has issue the customer barely notices it before the human acts 
and bring down the server affected.
we also had power surge in a POP that brought it offline entirely on DNS side 
(network was on dc while problem affected ac power only for some racks), and 30 
seconds after the service was up again using dnses of another pop. very 
effective given the giant fail we had.

about the timeouts, you don't need to wait if you bring down the loopbacks 
instead of the ospf daemon. after downing the loopbacks the ospf notifies he 
does not have those IPs anymore and upstream routers load share only on 
remaining servers.
then you can shut the daemon down.

I wondered if using the probe, but found the it was an overkill in our case 
since a simple transient hang in the network (STP issue, mismatched cabling) 
could have brought down an entire POP for a minor event. We preferred to have 
human monitoring instead since a 24x7 service was already there for network 
alarms and could easily correlate with other causes or real server issue.

We didn't had a single sw failure in more then 7 years with four different 
installations (RHEL 3, Centos 4,5,6) in a very complex environment due to 
efficency and law constraints (we have upstream DNS providing DNS poisoning for 
law requirement and a shared caching for all the anycast dnses).

Ciao,
A.


Il giorno 15/ago/2014, alle ore 09:46, "Anand Buddhdev" <[email protected]> ha 
scritto:

> On 15/08/2014 00:00, Nat Morris wrote:
>
>> BGP sessions between the ASR 9xxxx and each DNS server in the cluster,
>> ExaBGP running on them announcing their loopback/service /32 + /128
>> address(es).
>>
>> Health check scripts on each service to probe for service ability,
>> retract the announcement upon failure.
>
> We are doing this exact same thing on many RIPE NCC DNS servers, and it
> works very well. The other advantage of BGP is that as soon as you
> withdraw the announcement, the router stops sending traffic to the
> server. With OSPF, you have timeouts of several seconds before traffic
> stops arriving at a dead server.
>
> Regards,
>
> Anand
> _______________________________________________
> dns-operations mailing list
> [email protected]
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
> dns-jobs mailing list
> https://lists.dns-oarc.net/mailman/listinfo/dns-jobs

CONFIDENTIAL: This E-mail and any attachment are confidential and may contain 
reserved information. If you are not one of the  named recipients, please 
notify the sender immediately. Moreover, you should not disclose the contents 
to any other person, or should the information contained be used for any 
purpose or stored or copied in any form.

_______________________________________________
dns-operations mailing list
[email protected]
https://lists.dns-oarc.net/mailman/listinfo/dns-operations
dns-jobs mailing list
https://lists.dns-oarc.net/mailman/listinfo/dns-jobs

Reply via email to