-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
"Loren Wilton" writes: > > Currently, Bayes is the only code that actually *uses* knowledge of how a > > string is tokenized into words; this isn't exposed to the rules at all. > > This isn't even slightly true! Virtually every rule written against English > spam is in some way concerned with word breaks. In some cases in > obfuscation rules the rule may be concerned with ignoring word breaks. In > many cases like /you have already won!/i there are implicit word breaks in > the rule. Other rules use \b to require word breaks and prevent erroeous > matches. If breaks were completely arbitrary, the language would be nigh > unto unreadable, and virtually all existing rules would fail! You're misunderstanding me. Of course the people who write rules, are concerned with where the word breaks land. However, the rule-type code doesn't have any knowledge of word breaks; it's just matching a string of text, against a regexp. Bayes is the only rule-type code that does. - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Exmh CVS iD8DBQFDyt1PMJF5cimLx9ARAt2vAKCaQ9ehZ7VBsIN6lk0pgQrQ/epDKQCgvde5 T+F+m6iccEfkcpt+8jWXY+k= =sCX3 -----END PGP SIGNATURE-----