Hello. From: Motoharu Kubo <[EMAIL PROTECTED]> Subject: Re: Charset normalization issue (report, patch, and request) Date: Sun, 15 Jan 2006 13:21:15 +0900
> MATSUDA Yoh-ichi wrote: > > Spammer's word obfuscation techniques are not only separating LF. > > 'o' -> '0', 'i' -> '1', 'l' -> '|', 'a' -> '@', and more more... > > Tokinization isn't fit for these techniques. > > Just an idea. If there is a good proofreading software, we could detect > this kind of obfuscation universally in splitter(). Then we could tell > test rules that obfuscation is detected by inserting special mark or > some other means. But, for example, some domain names look like obfuscation words. All mail texts aren't written only natural words. REGEX detecting doesn't fit for word obfuscation trick, I think. It's a bayes area. -- Japanese spam EXPO :-p http://www.flcl.org/~yoh/spam/jp/ MATSUDA Yoh-ichi(yoh) mailto:[EMAIL PROTECTED] http://www.flcl.org/~yoh/diary/ (only Japanese)
