RE: [idn] homograph attacks

Michel Suignard Tue, 15 Feb 2005 22:17:42 -0800

No languages used in the former soviet union should require a mix of latin and 
cyrillic in a single dns label.
Unicode contains many latin homographs in the Cyrillic block exactly for that 
reason, to avoid mixing the two scripts in a single word. It is unfortunate 
that the exact visual match is now haunting us. However it should not be used 
as a rationale to accept registration of mixed Cyrillic/Latin labels by tld 
registries.


To answer another message in this thread, there is no definitive answer about 
which Unicode characters are allowed for a given languages. But in all 
languages that have a reasonable concept of 'words', you should never need to 
allow mixed script in a word, at least in the context of IDN label. There are 
exceptions to these rules, like in South and East Asia (Japanese comes to 
mind), but these languages can be detected reasonably using the Unicode script 
property.

Michel 

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kane, Pat

VeriSign does prevent domains with the Russian language tag from commingling 
A-Z with the Cyrillic characters.  It does permit 0-9 and the dash to be used.  
This filter also applies to other Cyrillic based languages such as Belarusian, 
Ukrainian, Serbian, Macedonian and Bulgarian.  

There are other languages that are listed within ISO 639-2 that today use a 
combination of Latin and Cyrillic as they were originally Latin based (Tajik 
was Arabic prior to being Latin based), migrated to Cyrillic during the Soviet 
era and today are migrating back to Latin.  It is common to use Latin and 
Cyrillic characters in Tajik, from what I understand not being a native 
speaker.  Granted there are not a lot of registrations in com net that are 
Tajik, but this is just the point of an IDN.

Pat Kane

RE: [idn] homograph attacks

Reply via email to