On 6/25/18 1:35 PM, swchang10--- via dev-security-policy wrote: > On Friday, August 11, 2017 at 6:54:22 AM UTC-7, Peter Bowen wrote: >> On Thu, Aug 10, 2017 at 1:22 PM, Jonathan Rudenberg via >> dev-security-policy <[email protected]> wrote: >>> RFC 5280 section 7.2 and the associated IDNA RFC requires that >>> Internationalized Domain Names are normalized before encoding to punycode. >>> >>> Let’s Encrypt appears to have issued at least three certificates that have >>> at least one dnsName without the proper Unicode normalization applied. >>> >>> It’s also worth noting that RFC 3491 (referenced by RFC 5280 via RFC 3490) >>> requires normalization form KC, but RFC 5891 which replaces RFC 3491 >>> requires normalization form C. I believe that the BRs and/or RFC 5280 >>> should be updated to reference RFC 5890 and by extension RFC 5891 instead. >> >> I did some reading on Unicode normalization today, and it strongly >> appears that any string that has been normalized to normalization form >> KC is by definition also in normalization form C. Normalization is >> idempotent, so doing toNFKC(toNKFC()) will result in the same string >> as just doing toNFKC() and toNFC(toNFC()) is the same as toNFC(). >> Additionally toNFKC is the same as toNFC(toK()). >> >> This means that checking that a string matches the result of >> toNFC(string) is a valid check regardless of whether using the 349* or >> 589* RFCs. It does mean that Certlint will not catch strings that are >> in NFC but not in NFKC. >> >> Thanks, >> Peter >> >> P.S. I've yet to find a registered domain name not in NFC, and that >> includes checking every name in the the zone files for all ICANN gTLDs >> and a few ccTLDs > > Hi, > I have an example international domain that is NFC but not NFKC, > "xn--ttt-8fa.pumesa.com" (this is a fake domain and my focus is on the > general pattern). > The pattern that will cause a domain to be NFC but not NFKC in Golang is: > "xn--" followed by any same three letters followed by a single "-" followed > by any single digit number followed by "fa"; now I know this pattern doesn't > describe real unicode, however the behavior in the programming language is > curious (below). > The pattern described above causes strings to be NFC positive but not NFKC in > Golang; furthermore, I ran a few tests using Golang (version go1.10.3 darwin) > and Java (version "1.8.0_60") and here is the key parts of the code I used: > 1) Golang (Used "ToUnicode" to mimic how Zlint tests): > package main > import ( > "fmt" > "golang.org/x/net/idna" > "golang.org/x/text/unicode/norm" > ) > func main(){ > str := "xn--xxx-7fa.pumesa.com" > punycode,err := idna.ToUnicode(str) > if err != nil { > fmt.Println(err) > } > fmt.Println("Is NFC ", norm.NFC.IsNormalString(punycode)) > fmt.Println("Is NFKC ", norm.NFKC.IsNormalString(punycode)) > } > > The last NFKC check is what causes Zlint to throw an error, stating that the > unicode is not in compliance, seems that Zlint needs to be updated to follow > the latest BR (RFC 5891), meaning check if the unicode in question is NFC > compliant rather than NFKC? > > Below is something even more interesting. > > 2) Java: > import java.net.IDN; > import java.text.Normalizer; > public class Main{ > public static void main(String args[]){ > String cn = "xn--www-0xx.pumesa.com"; > String punycode = IDN.toASCII(cn); > //punycode = IDN.toUnicode(punycode); > System.out.println("is NFC " + Normalizer.isNormalized(punycode, > Normalizer.Form.NFC)); > System.out.println("is NFKC " + Normalizer.isNormalized(punycode, > Normalizer.Form.NFKC)); > } > } > > Per Oracle doc, java.net.IDN.toASCII conforms with RFC 3490, and it throws no > error, this can be double checked within the language by converting the > punycode back to Unicode, both print statements return true. > > So to reiterate, the two main questions are: > 1) Should there be a discussion about why Oracle Java and Golang don't agree > on whether this pattern causes unicode to be NFKC compliant? > The potential impact is that results obtained from a Java system may not be > Zlint compliant. > 2) Should Zlint be updated to the latest BR (RFC 5891) regardless of question > #1?
Probably. However, please be aware that the change from RFC 3490 (IDNA) to RFC 5891 (IDNA2008) involved more than just a change from Unicode normalization form C to Unicode normalization form KC. Also relevant: https://tools.ietf.org/html/rfc8399 Peter
signature.asc
Description: OpenPGP digital signature
_______________________________________________ dev-security-policy mailing list [email protected] https://lists.mozilla.org/listinfo/dev-security-policy

