http://bugzilla.spamassassin.org/show_bug.cgi?id=3661
------- Additional Comments From [EMAIL PROTECTED] 2005-03-13 23:39 ------- Subject: Re: Request for HTML de-obfuscation of invisible SPAN's > > I can almost see having two body types: body, which is using the current > > rendering, and cleanbody, which uses the cleaned up rendering. > > > I would then expect the development of new, possibly simpler rules, that > > worked off the unabfuscated cleanbody terms. > > Well, sort of. I would actually do the reverse of what you describe. I had been thinking in terms of a new rule type for the cleaned body, so as to not break the current rules. However, in light if it seemingly not being that many rules, I suppose it doesn't matter much. (I'm a little queasy though about possibly lots of 3rd party rules suddenly 'breaking' though.) > So I would add either a new rule type (not thrilled > about that), or perhaps a tflag which specifies what type of text the > body rule is supposed to get. In passing, I don't understand the general reluctance to add new rule types. To me this seems incredibly cleaner and more obvious than crufting things up with overloaded meanings by using cryptic flags that people will forget how to spell. Did SA at some point in the past have dozens of rule types and go through a cleanup phase? Or was there some other bad experience in the past with too many rule types? (My experience with SA only goes back to 2.6, so I don't know ancient history.) >From where I sit looking at 2.6/3.0, I'd personally vote in favor of doubling or tripling the number of available rule types before I even thought about being concerned at the number of different types. (But then, I'd also be inclined to code PMS to not generate most of the rule type sources unless it was known that there was at least one test on that rule type. This seems pretty trivial to do the way the main rules evaluation works, last time I looked.) > That's not very efficient anyway. It's already flagged internally what is > visible and what isn't, we would just need to export a general "there was > invisible text" flag. I think we are both saying the same thing, I just didn't use the right words. > However, the rule is horrible as a spam detector: > > 4.187 4.0333 5.2863 0.433 0.00 0.01 T_HTML_INVIS_TEXT That amazes me. I wonder what kind of things are invisible in that ham mail? Are these newsletter type things, or Word HTML output? Or is there just normally some hidden text in most all HTML? Maybe there are 2-3 really common hidden things in HTML, and after excepting them, the results would improve? ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
