----- Original Message ----- From: "Dave Crocker" <[EMAIL PROTECTED]> > Elisabeth, > > This entire topic, and all its proposals, have very much been taken into > account by the IDN working group. They have been taken into account at the > cost of many months of delay, although this topic is actually outside the > scope of the working group. > > The topic calls for an algorithm that equates portions of different > scripts. This goes beyond the model of equating upper/lower case WITHIN a > script.
No. You are maybe pointing to my half-baked draft "look-alike normaliation + multicase ACE equivalence encoding ACROSS cyrillic/greek/latin script", not to CDNC's TSCONV-02 draft which attempts to add TC/SC 1:1 equivalence WITHIN the unified Han script block by borrowing the suggested framework briefed in my pre-draft. That may let you mix up the two. TSCONV-02 is succeeded by TSCONV-03 that takes brand new validation-based TC/SC filtering approach. > > In fact, this topic is an open research question with no generally accepted > practise. So even if the topic were within scope the solution would, at > best, be very, very risky. > > The risk is exacerbated by the fact that this technical approach does not > scale well. As soon as an approach like the TC/SC proposal is added, then > we must find mappings for many, many other multi-script equivalences. That > effort will probably take years. True. There are a huge set of "look-similar" equivalences in Unicode! But, fortunately, we have a much smaller set of "look-identical" equivalences. for example, the size of each set of equivalent cyrillic/gree/cherokee/latin characters is relatively small, and the equivalent pairs are more easily found than 'look-similar' ones. If we restrict the problem space into the 'look-identical' equivalence, we will reach the ideal goal faster and we can avoid the scalability problem in the proposed multicase encoding. As for "look-similar" characters, we can recommend new disambiguating font sets for IDN represenations. For LDH domains, we have already some font sets which have '0' in slashed-zero shape to be easily distinguished from alphabet 'o'. Soobok Lee > > d/ > > > ---------- > Dave Crocker <mailto:[EMAIL PROTECTED]> > Brandenburg InternetWorking <http://www.brandenburg.com> > tel +1.408.246.8253; fax +1.408.273.6464 >
