On Dec 26th Alan Grimes said.
 
>> !+ d03$n'7 vv0rk b3cuz $uch 4 $!st3m c4n'+ r34d m! 31337 +3x+.

It doesn't work because such a system can't read my ????? text

The dialect you just wrote in is a fairly easy substitution cipher for
humans to break in their heads because the replacement values share
similar visual or phonetic characteristics to the characters they are
substituting for.  

Since ascii values are used in patterns which NLP programs match to, and
hence lose those visual and phonetic characteristics it is not
unreasonable that the program would not be able to read this unless
either it's patterns were encoded to accept these simple replacements
values as valid substitutions or a cryptographic module was front ended
to break the cipher.  Either option has a very low payback in terms of
work required right now.

My matching system does use Levenschtein distance and dictionaries
sorted by most frequently used words to do intelligent substitutions for
incorrectly spelled words so that in you above example
Words like $uch and m! at least would be resolved correctly.  

If the above Hackerese dialect becomes a popular judge ruse in the
Turing competitions I imagine I could add the popular hacker
substitutions into my patterns fairly easily using a tool like SED, but
performance and readability of the patterns would tend to suffer.

By the way I consider most 11 and 12 year olds fairly intelligent and
the few I asked were not able to make the translation of this sentence
into English.  I also was unable to correctly translate 31337 even in
context so I guess unless the 31337 was what the cryptographers refer to
as a null value inserted to make decryption harder I would fail the
Turing test here as well!


-------
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

Reply via email to