----- Original Message ----- From: "Martin Duerst" <[EMAIL PROTECTED]> To: "Soobok Lee" <[EMAIL PROTECTED]>; "Soobok Lee" <[EMAIL PROTECTED]>; "Dan Ebert" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Thursday, October 18, 2001 6:32 PM Subject: Re: [idn] case preservation
> At 16:13 01/10/18 +0900, Soobok Lee wrote: > > > > > There is indeed a non-zero (but very, very small) probability > > > > for such cases. But if domain names are written in lower case > > > > the way they mostly have been up to now, a word in a language > > > > written in Cyrillic looking the same as a word in a language > > > > written in Latin would be about as rare as a four-leaf clover. > > > > > > > >No. mcuh more frequent than you guess. > > > >Cyrillic small 'a' 'e' 'o' 'c' 'p' 'x' 'y' 'i' 'j' 's' have the exactly > >same look with latin small ones. > > Yes. But 'i' 'j' 's' are not actually used in most languages > that are written in Cyrillic. And in all languages, most of > the possible letter combinations are not actually used. And > the longer a word is, the more quickly the probabilities > approach zero. > the _SUM_ of the probability for every word length 1..big N may converge to a certain non-zero value that should not be neglected. With 'HMTB',it will be much bigger than that. it's well known that all 3 letter labels are registered in LDH.com,.net,.org. every 3-letter cyrillic label from 'aeocpxy' collides with a 3-letter LDH.com . "copy.com" "coca.com" "ec.com" "ace.com" "eco.com" "ocx.com" "oxy.com" "cap.com" .... :-)) Cyrillic 'i' 'j' 's' are not used in Russian. But, as you know, 'j' is used in serbian,azerbaijani language in cyrillic script. and 'i' is ByeloRussian-Ukrainian 'I'. Are these characters extinct or live? Soobok Lee > > Regards, Martin. >
