e-gold and e-go1d

2008-11-29 Thread James A. Donald

To implement Zooko's triangle, one has to detect names
that may look alike, for example e-gold and e-go1d

This is a lot of code.  Has someone already written such
a collision detector that I could swipe?

The algorithm is to map all lookalike glyphs to
canonical glyphs - thus l and 1 are mapped to l, O and 0
are mapped to O, lower case o and the Greek omicron are
mapped to lower case o, and so on and so forth.  For
each pair of strings, one then does a character by
character diff, and pairs with suspiciously short diffs
might be confused by end users.

The program then asks the user for a qualification to
distinguish one or both of the names, default being as
first and second, or for the user to deprecate one of
the entities as scam or spam, or for the user to say he
does not care if new entries have the same or similar
name as this particular existing entry.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: e-gold and e-go1d

2008-11-29 Thread Ivan Krstić

On Nov 29, 2008, at 9:18 AM, James A. Donald wrote:

The algorithm is to map all lookalike glyphs to
canonical glyphs


The definition of lookalike glyphs depends on the choice of font and  
variant, and Unicode wraps the whole problem in a lovely layer of  
hell. If I had to do this, I'd investigate rendering both strings in  
the (same) target font and then quantifying the amount of overlap in  
the bitmaps, as e.g. SWORD does for TLDs:


http://icann.sword-group.com/icann-algorithm/Default.aspx

The above is proprietary; NIST's Paul Black has Python code available  
for a slightly enhanced Levenshtein distance:


http://hissa.nist.gov/~black/GTLD/

--
Ivan Krstić [EMAIL PROTECTED] | http://radian.org

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]