Hi, ----- Original Message ----- From: "Martin Duerst" <[EMAIL PROTECTED]> > >one idea (which I don't particularly like) is to assume that all characters > >within a single label are from a single langauge, and if the same glyph > >maps to different code points (indicating characters from differnet languages) > >then you resolve the ambiguity by using the code point that creates the > >fewest number of language changes. I won't even begin to list the problems > >with this; I mention it only because I think that this approximates the > >behavior that is most natural for human beings. > > I think this is worth trying, in order to get rid of the famous 'A' for > Latin, Greek, and Cyrillic. It's of course to be done on a per script > base, not per language. I wouldn't actually resolve by tweaking > codepoints (sometimes it will be very difficult to decide which > codepoint to tweak), but just by rejecting strange combinations. > You have to do a keyboard switch to get from one script to the other, > so the chance of getting a mixture accidentally isn't great. > Doing the check only on the registration side may also be a very > good idea; that may allow us to start with very tight rules and > expand them later (e.g. allow scripts separated by a hyphen,...). > It would also help a lot to address some bidirectionality problems. Good idea. Now, let's think about another case of all-Greek "oo.com" and all-Latin "oo.com": Either of the two consists of scripts from only single character sets. But the two still look very similiar. Do you have any good idea about this ? Regards, Soobok Lee
