http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5041
------- Additional Comments From [EMAIL PROTECTED] 2006-08-10 21:52 ------- > non-text noise parts Exactly, and it produces an array of "short" lines that are limited to 2048 characters to prevent overloading rules without them having to deal with line length individually. Perhaps we should say that message text is logically a set of words, and instead of an array of short lines produce an array of short words. Ok, I've thought about this a bit more and I'm leaving the previous paragraph I typed to provide context for how I'm thinking about this: The first attached example has a block of 76 character lines with no spaces. If we don't want to break up long URLs there may not be a "short" word length that would do us any good. Even worse, looking at the second example, with the uuencoded block, the lines are only 64 or so characters long and there are some embedded spaces, yet it still takes too long to process. What we haven't done is profile the slow rules to see just what the bottleneck(s) is/are. If there is a bottleneck common to all of those rules, once we know what it is we can either do something in message body processing or come up with some standard thing to do in such rules to avoid it. If we can't do that we may be stuck with figuring out a heuristic for detecting BASE64 and uuencoded blocks and not pass them through into the message body array -- But then we have to be very careful that spammers can't trick the parser to get it to allow text through that will render ok on the mail client. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
