Alastair, thanks for finding it and bringing it up. I think you're right that the problem is in that the test generation code doesn't properly apply the bidi criteria to *all* the labels if *any* of the labels are RTL, but instead is probably just going on a label-by-label basis. Thankfully, it looks like ICU does handle it right, by your note. (The test file generation doesn't use the ICU code.)
Could you please report this via http://www.unicode.org/reporting.html so that we make sure that it is tracked and brought up to the UTC? Mark Mark On Thu, Jan 5, 2017 at 10:46 AM, Alastair Houghton < alast...@alastairs-place.net> wrote: > On 4 Jan 2017, at 23:40, Markus Scherer <markus....@gmail.com> wrote: > > > > On Wed, Jan 4, 2017 at 2:28 AM, Alastair Houghton < > alast...@alastairs-place.net> wrote: > > RFC 5893 seems pretty clear to me, and the problem really is that the > test vectors (which come from unicode.org) seem (to me) to be incorrect. > > > > https://tools.ietf.org/html/rfc5893#section-2 says "The following rule, > consisting of six conditions, applies to labels in Bidi domain names." > > > > That's what the ICU code does -- applying the rule to each label -- and > I assume that's the basis for the test data. > > Absolutely. But the crucial part is “in Bidi domain names”. That is, it > applies to *all* labels that are part of a Bidi domain name, not just RTL > labels. It did not say “applies to RTL labels in Bidi domain names” and in > fact even explicitly states that (in the first bullet point at the end of > section 2): > > ...Note that even LTR labels and pure ASCII labels have to be tested. > > Not to mention the fact that parts 5 and 6 of the rule apply specifically > to LTR labels. > > So it’s quite clear that given the domain name “0à.א”, both “א” *and* “0à” > need to be checked using the Bidi Rule. Unless someone can explain why > “0à” does not fail the test, surely we all agree that line 74 is incorrect: > > > B; 0à.\u05D0; ; xn--0-sfa.xn--4db # 0à.א > > and similarly with line 93: > > > B; àˇ.\u05D0; ; xn--0ca88g.xn--4db # àˇ.א > > > ICU does not currently check for multi-label bidi combinations. > > I was a bit puzzled by this, because the code clearly does (both in the > C++ and Java versions) and yet the online demo doesn’t appear to object to > the above test cases. So I wrote a quick test program against the C++ > version of ICU 58.2 and fed it both test cases, and, sure enough, ICU > agrees that there is a BiDi error in both of the above cases. > > Kind regards, > > Alastair. > > -- > http://alastairs-place.net > > >