IE 8 has a mechanism for highlighting the "owning domain" of the current
website in the address bar.[0] As they are not using the Public Suffix
List, I asked if they would explain their algorithm for deciding what to
highlight.

A Microsoft engineer was kind enough to explain the algorithm to me, and
I have reimplemented it in Perl for my own understanding. I thought it
might also be of interest to DNSOP. Please find it attached. Note that
this is the algorithm for determining the "domain" property of the IURI
object; its (close) relationship to what is highlighted is explained in
comments in the script.

DNSOP might find it instructive to compare their approach with that of
KDE[1] and of Mozilla.

Gerv

[0]
http://blogs.msdn.com/ie/archive/2008/03/11/address-bar-improvements-in-internet-explorer-8-beta-1.aspx
[1] http://www.ietf.org/mail-archive/web/dnsop/current/msg06185.html
#!/usr/bin/perl -w
# Implementation of the IE algorithm for determining the IURI 'domain' 
# property from a FQDN. This property is important in determining the 
# behaviour of IE 8's address bar highlighting.
# 
# Written by Gervase Markham <[EMAIL PROTECTED]>
# 2008-08-25
# Any copyright possessed by me is dedicated to the Public Domain.
# http://creativecommons.org/licenses/publicdomain/
# This statement has no effect on any rights Microsoft may or may not have.
# 
# This implementation is based on an explanation from Eric Lawrence of 
# Microsoft (see numbered comments). Many thanks to him. No Microsoft code 
# was viewed or copied to create this implementation.
# 
# IURI's "domain" property 
# (http://msdn.microsoft.com/en-us/library/ms775016(VS.85).aspx) is made up of 
# "PublicSuffix + 1". If there is no Public Suffix (e.g. http://foo/) then the 
# domain property is null. [This script determines the value of this property.]
# 
# IURI's "host" property 
# (http://msdn.microsoft.com/en-us/library/ms775019(VS.85).aspx) returns the 
# FQDN.
# 
# IE's address bar highlighting will highlight either the IURI domain or, if 
# the (IURI domain==null) AND the hostname is plain (no dots), then it will 
# highlight the plain hostname. If the domain is null and the hostname is 
# dotted, then nothing is highlighted (considered an error condition).
#
# Microsoft does not guarantee that any of this will always continue to work # 
this way.

my $fqdn = $ARGV[0] || die "No FQDN given.\n";

print getDomain($fqdn) . "\n";
exit(0);

sub getDomain {
  # 1> If the final label is empty, drop it for the purposes of this algorithm
  # Otherwise "www.ericlawrence.com." would have four labels "www", 
  # "ericlawrence", "com", "".  Instead, we drop the final label.
  # GRM NOTE: split() automatically does this.  
  my @labels = split('\.', $_[0]);
  
  my $labelCount = getNoOfLabelsInDomain(@labels);
  
  # 2> Name the labels Ln,...,L3,L2,L1; decreasing from start (Leftmost=Ln) to 
  # finish (Rightmost=L1).
  # If at any point in this algorithm the result demands >n labels, getDomain 
  # returns "".
  if (@labels < $labelCount) {
    print STDERR "Rule 2: algorithm requires more labels than present.\n";
    return "";
  }
  else {
    return join(".", @labels[-$labelCount..-1]);
  }
}

sub getNoOfLabelsInDomain {
  my @labels = @_;
  
  # 3> Check n > 1.  If not, there's no domain, just a plain hostname. Return 
  # ""; exit.
  # Dotless FQDNs consist of a host only, there is no domain.
  if (@labels < 2) {
    print STDERR "Rule 3: only one label.\n";
    return 0;
  }

  # 4> Check L1 == "tv".  If so, getDomain returns L2.L1; exit.
  # "tv" is a special-case "completely flat" ccTLD for historical reasons.
  if ($labels[-1] eq "tv") {
    print STDERR "Rule 4: .tv is special.\n";
    return 2;
  }

  # 5> Check Len(L1) > 2.  If so, getDomain returns L2.L1; exit.
  # Len(L1)>2 suggests L1 is a gTLD rather than a ccTLD.
  # If Len(L1)<=2 we assume L1 is a part of a ccTLD.
  if (length($labels[-1]) > 2) {
    print STDERR "Rule 5: length of L1 > 2 therefore gTLD.\n";
    return 2;
  }

  # 6> Check if L2 in gTLD list "com,edu,net,org,gov,mil,int".  If so, 
  # getDomain returns L3.L2.L1; exit.
  # gTLDs, when they appear immediately left of a ccTLD (modulo exception in 
  # step 4), are considered a part of the TLD.
  if ($labels[-2] =~ /^(com|edu|net|org|gov|mil|int)$/) {
    print STDERR "Rule 6: L2 is in gTLD list therefore return 3 parts.\n";
    return 3;
  }

  # 7> If L1 is in the list "GR,PL" AND L2 is NOT in the gTLD list, getDomain 
  # returns L2.L1; exit.
  # GR and PL are considered "flat" ccTLDs EXCEPT when a gTLD appears in L2.
  # getDomain("a.pl") returns "a.pl"
  # getDomain("a.uk") returns ""  
  # GRM NOTE: no check on L2 because it's caught by code above
  if ($labels[-1] =~ /^(gr|pl)$/) {
    print STDERR "Rule 7: .gr and .pl are special.\n";
    return 2;
  }

  # 8> If Len(L2) < 3 getDomain returns L3.L2.L1; exit.
  # getDomain("aa.bb.cc") returns "aa.bb.cc"
  if (length($labels[-2]) < 3) {
    print STDERR "Rule 8: length(L2) < 3.\n";
    return 3;
  }

  # 9> Otherwise, getDomain returns L2.L1
  # getDomain("aa.bbb.cc") returns "bbb.cc"
  print STDERR "Rule 9: default.\n";
  return 2;
}
_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to