Hi,
Apologies in advance for the length of this message. As a person 'somewhat' involved in inflicting this topic into dnsop in the first place and being listed as a co-author (although Suzanne should get the credit for keeping this alive), I have to admit I'm confused.
My intent in writing the original draft was to document existing practice and behavior (a txt query in the chaos class for {version,hostname}.bind) and suggest an alternative with minimal changes that would be less offensive to other vendors (a txt query in the chaos class for {version,id}.server).
The reason I suggested the approach I did was that I figured the folks who implemented chaos/txt *.bind (which, surprisingly enough, isn't just BIND) would find it trivial to change the string queried for. Mucking about with anything any more complicated than that (e.g., OPT RRs, the status opcode, multiple questions per query, or whatever) would imply code changes and thus, require significantly more work (in a relative sense). In fact, as I was coming up with my original draft, I asked some other DNS server implementors if they'd be willing to do chaos/txt id.server and they had indicated that it wouldn't be out of the question.
The reason I thought this functionality would be useful was because the root servers were then just beginning (!) to be anycasted and one of the concerns expressed about this approach for strengthening the roots was that if an anycast root server instance went off into the weeds, it would be nice to be able to identify the culprit instance.
In other words, my goal was to suggest something _simple_ to meet an actual and immediate _operational_ need. A novel and heretical thought, I know. I was and am perfectly well aware that my original proposal was far from optimal, however I felt it had the best chance of actually being implemented in real live operational servers before hell froze over.
Unfortunately, after a particularly unpleasant public exchange with Randy Bush (for which I belatedly apologize to dnsop participants who had to wade through it), I gave up in disgust. Subsequently, Suzannne took on the responsibility of continuing tilting at this particular windmill.
I have now actually read the current draft and I must be missing something. I apologize if this has been hashed out while I was off being distracted by other things and the answer is obvious to everyone else, but...
One of the major objections against the original server.id approach was that the chaos/txt query would only be sent after someone noticed the server at a particular IP address was being bad, implying this reactive query could take a different path and end up identifying the wrong server.
However, unless the "good answer" (the characteristics of which are described in the draft) exists in each and every query, anyone experiencing difficulties that require identification of a particular server will need to send additional queries with the necessary flags to include the {opt RR, status opcode, additional question, whatever} that tells the server to identify itself. Obviously, if you have to do this, the query might take a different path and get you no closer to what you need to know than an out-of-band approach.
If I assume the identifying thingie is always on, then you get into the problems of dealing with DNS servers/load balancers/firewalls/NAT boxes that do not (and likely never will) understand such esoterica as any of the possible solutions discussed. This implies we'll have to do the same sort of silliness we do with EDNS0 which also implies the identifying thingie simply can't be used.
Of course, one could argue that broken and/or obsolete servers would get replaced but experience has shown it exceedingly difficult to get people to upgrade their servers, even for root level exploits in those servers.
A more reasonable argument can be made that DNS servers/load balancers/firewalls/NAT boxes/etc. may treat class chaos queries different than class IN queries, however class chaos queries are obviously made today and they get responded to, thus there is at least some running code that indicates they 'work' (for some definition of that variable). This obviously can't be said of any of the in-band proposals.
Lastly, regardless of whether the identifying thingie is always on or not, I believe in the vast majority of cases, you'll need some DNS savvy person to actually research and discover that a server instance is misbehaving. As such, for an in-band solution to be any different than an out-of-band solution, you'll need to log every response on the off chance that badness will occur. This seems unlikely to me.
With regards to the other disadvantages of the current approach (which, by extension can be applied to the id.server approach):
- section 3.2, #2
If there is anyone today who is actually using the CHAOS class as it was originally intended, I would be fascinated to hear about it.
While I was not the one who chose to use the CHAOS class for this query, I would bet that person's life (:-)) that the version.bind query or functional equivalent is the _only_ current use of that class. Given class CH is specified in RFC 1034/1035 and it is both inadvisable and has proven well nigh impossible (at least in the real world) to remove anything defined and implemented in the Internet today, I would argue that we might as well take advantage of the CHAOS class's existence. This particular use of the CHAOS class is supported by the DNS specifications -- the only protocol violation that might occur is in those name servers that are used for CHAOS naming. I suspect this set of name servers is rather small.
Of course it can be (and has been) argued that implementing class CH is a waste of time. I don't disagree. I know of at least one DNS server implementation that implements only the ability to answer a TXT query for a particular string (no points for guessing what string). Alternatively, an implementor may choose to not implement the identification mechanism (whatever it might be) and simply accept the fact that they will be decreasing the chances that their server will be used in anycast/load balancing situations. That's fine too. There are plenty of DNS servers out there these days.
- section 3.2, #3
As I mentioned above, there are multiple implementations that support the general concept, if not the exact implementation of chaos/txt version.bind. Those implementations include BIND, Nominum ANS, PowerDNS, MyDNS, and NSD (at least). There are probably others, but I didn't bother to investigate further. There were some folks who, when I asked, indicated it would be a lot easier for them to get management to agree if the TLD was changed to something less vendor specific, thus I chose "server". Of the folk I spoke to who didn't dismiss the idea out of hand due to the use of class CH, all indicated id.server would be an acceptable alternative. However, this was a (long) while back, so things might have changed.
In any event, what am I missing? Is there some reason we're trying to make this more complicated that it has to be? Or is this yet another example of the OSIfication of the IETF?
Thanks, -drc
. dnsop resources:_____________________________________________________ web user interface: http://darkwing.uoregon.edu/~llynch/dnsop.html mhonarc archive: http://darkwing.uoregon.edu/~llynch/dnsop/index.html
