On 4 Jan 2017, at 08:12, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote: > > Hello Alastair, > > On 2016/12/06 20:51, Alastair Houghton wrote: >> Hi all, >> >> I must be missing something; in IdnaTest.txt, in the BIDI TESTS section, >> there are examples like (line 74) > > Can you tell us where you got IdnaTest.txt from?
Yes, sorry, I should have included that information. It’s here, with the IDNA mapping table http://www.unicode.org/Public/idna/9.0.0/ which I arrived at from UTS #46 (<http://www.unicode.org/reports/tr46>). >> B; 0à.\u05D0; ; xn--0-sfa.xn--4db # 0à.א >> >> which the file alleges are valid, but I cannot for the life of me see why. >> First, “0à.א” is clearly a “Bidi domain name” since it has at least one RTL >> label, “א”. As such, the Bidi Rule (RFC 5893 section 2) should be applied >> to its labels, and the label “0à” fails [B1], since the first character has >> Bidi property EN, not L, R or AL. > > On first sight, it looks to me as if you're correct. > > For the exact interpretation of RFC 5893, you'd better write to the mailing > list of the former IDNA(bis) WG at idna-upd...@alvestrand.no. RFC 5893 seems pretty clear to me, and the problem really is that the test vectors (which come from unicode.org) seem (to me) to be incorrect. I think the Unicode list is, therefore, the right place to raise this issue, but you’re right that it might attract attention from the right people if I also fire off a mail to the IDNA WG list. >> Similarly (line 93) >> >> B; àˇ.\u05D0; ; xn--0ca88g.xn--4db # àˇ.א >> >> Again, “àˇ.א” is clearly a “Bidi domain name”, but “àˇ” fails [B6], because >> “ˇ” has Bidi property ON, not L, EN or NSM. >> >> Have I misunderstood something fundamental here? Could someone explain why >> those examples are valid, in spite of RFC 5893? As an additional data point, ICU’s IDNA demo web page appears to think these names are OK. Kind regards, Alastair. -- http://alastairs-place.net