Edmon Chung <[EMAIL PROTECTED]> wrote: > ----- > 2) If a string contains any Right-to-Left character (defined as > belonging to Unicode bidirectional categories "R" and "AL"), the string > MUST NOT contain any Left-to-Right character (defined as belonging to > Unicode bidirectional category "L"). > > 3) If a string contains any Right-to-Left character (as defined above), > a Right-to-Left character MUST be the first character of the string, and > a Right-to-Left character MUST be the last character of the string. > ----- > > I dont quite understand why we need to have 3. > Isnt 3 a subset of 2?
No, because there are characters that are neither Right-to-Left nor Left-to-Right. > Also this will mean that there cannot be a mixture between RTL and LTR > characters. Correct. That is exactly rule 2 above. "If X appears then Y must not appear" is exactly equivalent to "X and Y must not both appear". > While I am not familiar with Arabic, I sure have seen English words > mixed with Arabic in phrases, albeit rare. I don't doubt that. I know almost nothing about the bidi algorithm, but the bidi experts concluded that this was the price that needed to be paid to prevent distinct labels from being displayed identically. > I didn't see much discussion on the list before on bidi issues, but I > did see an example used: > > > Assume there were two labels inside the DNS, one reading ABCdef and > > the other reading defABC, and both would be displayed CBAdef. Who > > would consider that usable for the DNS? > > Why would both be displayed the same? I don't know. :) The answer can presumably be inferred from the bidi tech report. > a given "string" can be a "part" of a label, so there could be two > "strings", one containing LTR one RTL in the same label. Nameprep doesn't need to know whether it's input string is a label or part of a label or whatever. Nameprep could operate on any string. But in IDNA, the IDNA spec specifies that nameprep is applied to labels, not to substrings of labels. Therefore a label cannot contain both LTR and RTL characters. > Please clarify two simple things: > a. Are mixed RTL and LTR characters allowed within a label? No. > b. If there are more categories than R, AL and L that we are discussing, > then in point 3 it should not say "As defined above": Yes it should. Rule 3 is using the same definition of Right-to-Left character as stated in rule 2: 'belonging to Unicode bidirectional categories "R" and "AL"'. AMC
