Hi Chris Answers in-line:
On 3/9/16, 1:20 PM, "[email protected] on behalf of Nate Davis" <[email protected] on behalf of [email protected]> wrote: >On 3/9/16, 11:34 AM, "Christopher Morrow" <[email protected]> >wrote: > >>Thanks! >>(I have a few questions, which may not be answerable here, I suppose.. >>if they can be answered that'd be cool though) >> >>On Tue, Mar 8, 2016 at 12:59 PM, Nate Davis <[email protected]> wrote: >>> >>> ARIN's DNS process moves DNS data from the internal database to a >>>Secure64 >>> DNSSEC appliance to a hidden distribution master. From the hidden >>> distribution >>> master, zones are fetched to name server constellations from ARIN, >>> VeriSign, and PCH. >>> >>> About two weeks ago a script was run that reset the serial on a zone in >>> the database. This script was run to accommodate an inter-RIR network >> >>This script sounds like something that should/would happen >>periodically? (whenever there's an xfer I guess?) is that correct? Not even that frequently. It only needs to be run when we initially set up a /8 for out-of-region transfers. This marks the /8 in our system so that we can start doing things like retrieving, validating, and aggregating the RIR snippets to put into our published zone file, and eventually do the right things to work with RPKI and so on. Of course, after this weekend¹s deploy, we will no longer need to run this script as the system will automatically detect this and mark the zone. >>> This incident exposed a gap in our monitoring that we are fixing. Our >> >>is/was the gap: "Make sure serial is monotonically increasing" >>or is/was it: "If you are going to backup the serial, be sure to force >>a reload on all masters via process X" >> >>(ie: If I make a serial change, what other things should I look for? >>what monitoring gap do I also have?) No, it was the soa checking went from the distribution master out to the anycast cloud. We have had incidents in the past where various nodes where not fetching the latest zone within a reasonable interval. So, we added checks that would make sure the soa would update within a "reasonable interval". If the node did not update within a reasonable interval, on-call people got notified to escalate. Unfortunately, we did not do the same monitoring going on internally within our provisioning flow. We did not monitor appropriately for our internal nodes. That has now been fixed. >>For dnssec I suppose you'd be doing the above but pulling rrsig for >>the SOA and making sure they are all the same. What we want to do is to catch it before the sig expires. Do you have any ideas? Thanks, Mark _______________________________________________ arin-tech-discuss mailing list [email protected] http://lists.arin.net/mailman/listinfo/arin-tech-discuss
