This message contains responses to both James Seng and Soobok Lee. James Seng <[EMAIL PROTECTED]> wrote:
> My believe is what is allowed in host labels is a topic for the zone > administrator to decide. .CN have a different set compared to .SG > compared to .COM compared to say IBM.COM. Zone administrators can always impose their own restrictions, but that still leaves us with the question of what the IRI spec should say about what characters are allowed in the host field of IRIs. The historic precedent is that ASCII punctuation and symbols are allowed in ASCII *domain* names, but not in ASCII *host* names, and not in the host field of URIs. Should IRIs be more loose and allow non-ASCII punctuation and symbols in the host field (while continuing to disallow ASCII punctutation and symbols)? Or should IRIs try to apply an old tradition to a new situtation, and disallow punctuation and symbols? Soobok Lee <[EMAIL PROTECTED]> wrote: > > L: letter > > M: mark > > N: number > > P: punctuation > > S: symbol > > Z: separator > > C: other > > May I add this? > > U: unassigned code points. I see your motivation. The classes I listed are all the ones mentioned in the Unicode character database, but of course the database covers only assigned code points. All code points not mentioned in the database are unassigned, and we could view that as another class. > U should be also allowed in addition to L,M,N. ToASCII and Nameprep already take an input flag indicating whether unassigned code points are to be allowed or prohibited. My proposal wouldn't change that. AMC
