http://bugzilla.spamassassin.org/show_bug.cgi?id=3661
------- Additional Comments From [EMAIL PROTECTED] 2005-03-13 18:49 ------- Subject: Re: Request for HTML de-obfuscation of invisible SPAN's > 4) any text that is invisible is no longer added to the default "rendered" text > array. > > #4 is the biggie here, imo. It does make the rendered output more "correct" > IMO, but most spam hits go down (range rules tend to push results towards the > top of the range though) without significantly changing the ham hits. This is interesting. This would seem to imply that the rules have become tailored not to detect the original evil words, but to detect various obfuscated flavors of the words. By removing some of the obfuscation you cause fewer obfuscation hits, at a guess. Note I'm lumping things like longwords and tripwire into obfuscation rules, since these things tend to detect the sort of garbage used for obfuscation. I suspect if you extended your rendering change to catch 0pt rendering as invisible (or <2pt rendering, for that matter) and perhaps tried to catch white-on-white, that the spam hits would go down even more with the current rules. I don't know that it is wrong either that the spam hits go down, or that the garbage should be removed. I do think that an indication of the garbage, if not the garbage itself, might need to be provided to rules or otherwise accounted for though. I can almost see having two body types: body, which is using the current rendering, and cleanbody, which uses the cleaned up rendering. Doing that would not affect the current spam hits, since 'body' would remain the same. You probably also need to provide an internal term that could be used in a meta rule to indicate that invisible text disappeared. Since you can't compare the body and cleanbody text at the rule level, it would otherwise be difficult to determine if garbage got removed. I would then expect the development of new, possibly simpler rules, that worked off the unabfuscated cleanbody terms. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
