I would caution against putting much faith in the validity of geolocation or site ID by reverse DNS PTR records. There are a vast number of unmaintained, ancient, stale, erroneous or wildly wrong PTR records out there. I can name at least a half dozen ISPs that have absorbed other ASes, some of those which also acquired other ASes earlier in their history, forming a turducken of obsolete PTR records that has things with ISP domain names last in use in the year 2002.
On Mon, Apr 29, 2019 at 6:15 AM Matthew Luckie <[email protected]> wrote: > Hi NANOG, > > To support Internet topology analysis efforts, I have been working on > an algorithm to automatically detect router names inside hostnames > (PTR records) for router interfaces, and build regular expressions > (regexes) to extract them. By "router name" inside the hostname, I > mean a substring, or set of non-contiguous substrings, that is common > among interfaces on a router. For example, suppose we had the > following three routers in the savvis.net domain suffix, each with two > interfaces: > > das1-v3005.nj2.savvis.net > das1-v3006.nj2.savvis.net > > das1-v3005.oc2.savvis.net > das1-v3007.oc2.savvis.net > > das2-v3009.nj2.savvis.net > das2-v3012.nj2.savvis.net > > We might infer the router names are das1|nj2, das1|oc2, and das2|nj2, > respectively, and captured by the regex: > ^([a-z]+\d+)-[^\.]+\.([a-z]+\d+)\.savvis\.net$ > > After much refinement based on smaller sets of ground truth, I'm > asking for broader feedback from operators. I've placed a webpage at > https://www.caida.org/~mjl/rnc/ that shows the inferences my algorithm > made for 2523 domains. If you operate one of the domains in that > list, I would appreciate it if you could comment (private is probably > better but public is fine with me) on whether the regex my algorithm > inferred represents your naming intent. In the first instance, I am > most interested in feedback for the suffix / date combinations for > suffixes that are colored green, i.e. appear to be reasonable. > > Each suffix / date combination links to a page that contains the > naming convention and corresponding inferences. The colored part of > each hostname is the inferred router name. The green hostnames appear > to be correct, at least as far as the algorithm determined. Some > suffixes have errors due to either stale hostnames or incorrect > training data, and those hostnames are colored red or orange. > > If anyone is interested in sets of hostnames the algorithm may have > inferred as 'stale' for their network, because for some operators it > was an oversight and they were grateful to learn about it, I can > provide that information. > > Thanks, > > Matthew >

