Re: Charset normalization issue (report, patch, and request)

Motoharu Kubo Sat, 14 Jan 2006 20:21:38 -0800

MATSUDA Yoh-ichi wrote:

Spammer's word obfuscation techniques are not only separating LF.
'o' -> '0', 'i' -> '1', 'l' -> '|', 'a' -> '@', and more more...
Tokinization isn't fit for these techniques.

Just an idea. If there is a good proofreading software, we could detectthis kind of obfuscation universally in splitter(). Then we could telltest rules that obfuscation is detected by inserting special mark orsome other means.


--
Motoharu Kubo
[EMAIL PROTECTED]

Re: Charset normalization issue (report, patch, and request)

Reply via email to