Re: utf-8 -> punycode for ServerName|Alias?

William A. Rowe Jr. Mon, 30 Jul 2012 13:55:21 -0700

On 4/7/2012 2:59 AM, Tim Bannister wrote:
> On 7 Apr 2012, at 07:33, William A. Rowe Jr. wrote:
> 
>> So we have live registrars, no longer "experimental", who are now 
>> registering domains in punycode.  Make of it what you will.
>>
>> Do we want to recognize non-ASCII strings in the ServerName|Alias directives 
>> as utf-8 -> punycode encodings?  Internally, from the time the servername 
>> field is assigned, it can be an ascii mapping.
> 
> I think this is more important for mass virtual hosting (VirtualDocumentRoot 
> from mod_vhost_alias, etc). Users would create a document root directory 
> named, eg, テスト.example and expect it to work. They don't know anything about 
> Unicode, let alone punycode.
> I reckon a lot of users would work out quickly that only Roman characters 
> work in domain names, but they aren't going to be able to work out how to 
> rename that folder into the correct punycode nor to tell the folders apart if 
> renamed in this way.
> 
> As a user: I already have a configuration file with a UTF-8 ServerAlias 
> defined, that's just waiting for httpd to implement this feature … and until 
> then, I have the punycoded version in there as well.


I've spent a bit more time on this.  The obvious issue of ambiguious domain
registrations is being handled on a registrar-by-registrar basis, and you
can get a nice summary of the punycode entries accepted by various registrars
here; http://www.mozilla.org/projects/security/tld-idn-policy-list.html

In thinking about what punycode is dangerous to represent, I can't come up
with any within the context of httpd.

 1. User VirtualHost ServerName/ServerAlias entries, or mod_vhost_alias
    entries.  These are controlled by the administrator, not affected by
    the remote client.  Provided that client provided non-ASCII domains
    are refused, then punycode can be represented as UTF-8 in our access
    and error logs, server config directives and so forth when referring
    to the locally configured domain names.  We should always present
    these in things like mod_info and httpd -D DUMP_VHOSTS as name(punyname)
    to help the administrator to untangle any confusion.

 2. Location: headers and automated self-url references should must present
    the punycode url in href= and other header fields, but may present the
    utf-8 in the presentation context such as error pages or autoindexes, etc.
    Whatever the W3C has to say about this in HTML5 is irrelevant if we don't
    know whether the user agent supports utf-8 -> punycode transliteration.

What is less clear is what precautions we should take when functioning as
a forward proxy with proxy uri string contents, or presenting user-provided,
non-canonicalized host names.  I can imagine such translation being abused to
conceal some forms of XSS exploitation.

I'd start by assembling a patch to introduce punycode transliteration into the
apr-util library and another patch into httpd for vhost, mass-vhosting using
utf-8 path names, and presenting trusted utf-8 values for our error log and
field tokens.  Does anyone have concerns before I begin messing with this logic?

Re: utf-8 -> punycode for ServerName|Alias?

Reply via email to