MATSUDA Yoh-ichi wrote:
Spammer's word obfuscation techniques are not only separating LF.
'o' -> '0', 'i' -> '1', 'l' -> '|', 'a' -> '@', and more more...
Tokinization isn't fit for these techniques.

Just an idea. If there is a good proofreading software, we could detect this kind of obfuscation universally in splitter(). Then we could tell test rules that obfuscation is detected by inserting special mark or some other means.

--
Motoharu Kubo
[EMAIL PROTECTED]

Reply via email to