IE 8 has a mechanism for highlighting the "owning domain" of the current website in the address bar.[0] As they are not using the Public Suffix List, I asked if they would explain their algorithm for deciding what to highlight.
A Microsoft engineer was kind enough to explain the algorithm to me, and I have reimplemented it in Perl for my own understanding. I thought it might also be of interest to DNSOP. Please find it attached. Note that this is the algorithm for determining the "domain" property of the IURI object; its (close) relationship to what is highlighted is explained in comments in the script. DNSOP might find it instructive to compare their approach with that of KDE[1] and of Mozilla. Gerv [0] http://blogs.msdn.com/ie/archive/2008/03/11/address-bar-improvements-in-internet-explorer-8-beta-1.aspx [1] http://www.ietf.org/mail-archive/web/dnsop/current/msg06185.html
#!/usr/bin/perl -w # Implementation of the IE algorithm for determining the IURI 'domain' # property from a FQDN. This property is important in determining the # behaviour of IE 8's address bar highlighting. # # Written by Gervase Markham <[EMAIL PROTECTED]> # 2008-08-25 # Any copyright possessed by me is dedicated to the Public Domain. # http://creativecommons.org/licenses/publicdomain/ # This statement has no effect on any rights Microsoft may or may not have. # # This implementation is based on an explanation from Eric Lawrence of # Microsoft (see numbered comments). Many thanks to him. No Microsoft code # was viewed or copied to create this implementation. # # IURI's "domain" property # (http://msdn.microsoft.com/en-us/library/ms775016(VS.85).aspx) is made up of # "PublicSuffix + 1". If there is no Public Suffix (e.g. http://foo/) then the # domain property is null. [This script determines the value of this property.] # # IURI's "host" property # (http://msdn.microsoft.com/en-us/library/ms775019(VS.85).aspx) returns the # FQDN. # # IE's address bar highlighting will highlight either the IURI domain or, if # the (IURI domain==null) AND the hostname is plain (no dots), then it will # highlight the plain hostname. If the domain is null and the hostname is # dotted, then nothing is highlighted (considered an error condition). # # Microsoft does not guarantee that any of this will always continue to work # this way. my $fqdn = $ARGV[0] || die "No FQDN given.\n"; print getDomain($fqdn) . "\n"; exit(0); sub getDomain { # 1> If the final label is empty, drop it for the purposes of this algorithm # Otherwise "www.ericlawrence.com." would have four labels "www", # "ericlawrence", "com", "". Instead, we drop the final label. # GRM NOTE: split() automatically does this. my @labels = split('\.', $_[0]); my $labelCount = getNoOfLabelsInDomain(@labels); # 2> Name the labels Ln,...,L3,L2,L1; decreasing from start (Leftmost=Ln) to # finish (Rightmost=L1). # If at any point in this algorithm the result demands >n labels, getDomain # returns "". if (@labels < $labelCount) { print STDERR "Rule 2: algorithm requires more labels than present.\n"; return ""; } else { return join(".", @labels[-$labelCount..-1]); } } sub getNoOfLabelsInDomain { my @labels = @_; # 3> Check n > 1. If not, there's no domain, just a plain hostname. Return # ""; exit. # Dotless FQDNs consist of a host only, there is no domain. if (@labels < 2) { print STDERR "Rule 3: only one label.\n"; return 0; } # 4> Check L1 == "tv". If so, getDomain returns L2.L1; exit. # "tv" is a special-case "completely flat" ccTLD for historical reasons. if ($labels[-1] eq "tv") { print STDERR "Rule 4: .tv is special.\n"; return 2; } # 5> Check Len(L1) > 2. If so, getDomain returns L2.L1; exit. # Len(L1)>2 suggests L1 is a gTLD rather than a ccTLD. # If Len(L1)<=2 we assume L1 is a part of a ccTLD. if (length($labels[-1]) > 2) { print STDERR "Rule 5: length of L1 > 2 therefore gTLD.\n"; return 2; } # 6> Check if L2 in gTLD list "com,edu,net,org,gov,mil,int". If so, # getDomain returns L3.L2.L1; exit. # gTLDs, when they appear immediately left of a ccTLD (modulo exception in # step 4), are considered a part of the TLD. if ($labels[-2] =~ /^(com|edu|net|org|gov|mil|int)$/) { print STDERR "Rule 6: L2 is in gTLD list therefore return 3 parts.\n"; return 3; } # 7> If L1 is in the list "GR,PL" AND L2 is NOT in the gTLD list, getDomain # returns L2.L1; exit. # GR and PL are considered "flat" ccTLDs EXCEPT when a gTLD appears in L2. # getDomain("a.pl") returns "a.pl" # getDomain("a.uk") returns "" # GRM NOTE: no check on L2 because it's caught by code above if ($labels[-1] =~ /^(gr|pl)$/) { print STDERR "Rule 7: .gr and .pl are special.\n"; return 2; } # 8> If Len(L2) < 3 getDomain returns L3.L2.L1; exit. # getDomain("aa.bb.cc") returns "aa.bb.cc" if (length($labels[-2]) < 3) { print STDERR "Rule 8: length(L2) < 3.\n"; return 3; } # 9> Otherwise, getDomain returns L2.L1 # getDomain("aa.bbb.cc") returns "bbb.cc" print STDERR "Rule 9: default.\n"; return 2; }
_______________________________________________ DNSOP mailing list [email protected] https://www.ietf.org/mailman/listinfo/dnsop
