http://bugzilla.spamassassin.org/show_bug.cgi?id=4636
------- Additional Comments From [EMAIL PROTECTED] 2005-10-18 14:53 ------- (In reply to comment #6) > Earlier in the ticket you were talking about header normalization. Body > normalization is a different beast (but it's easier to deal with imo). They are both components of this enhancement. The header argument is easier to make, though Node::rendered() has a similar argument since charset normalization has to happen before it feeds the text into HTML::Parser. > It's worth noting that this is actually going to be a much larger issue > than just having a plugin, btw. The main problem is that SpamAssassin > very specifically disables unicode in every module via "use bytes" > (according to the svn log it looks like it was added in at r3997 back > in December 2002). I got rid of a bunch of those in r315047. I should audit the remaining ones. > I was thinking that the plugin would be called by check_start, then > get an array of parts via find_parts(), then do any manipulation of > the data as required per-part (either dealing with the decoded or the > rendered portions, or both). Something like this would make sense if normalization/rendering were done outside Message::Node. It would be harder to do lazy normalization of headers that way. > Potentially, there'd be a new function in Message like > "clear_rendered_cache" or something which would delete the cached forms > of text_rendered, text_visible_rendered, text_invisible_rendered, and > (if necessary/different function) text_decoded. If the cache got filled before the normalization plugin got invoked, that would indicate a bug in the order of execution. > It's not very clean from an OO perspective. Arguably we'd always want > to make sure the message is in utf-8 format internally, and so the code > could just be in Message::Node. There's no clean OO separation between data and view, but that was preexisting--Message::Node already knows almost everything about SpamAssassin's view of MIME entities. There is the issue whether charset normalization should be: A) a plugin B) hardcoded but enabled/disabled by config C) hardcoded, always on for sufficiently recent versions of Perl Once I get a charset normalizer hooked up I can get some numbers on how much it costs. I was operating under the assumption that it would be too expensive to enable for everybody. I was thinking that having a plugin allowed people more flexibility in tuning the normalization process, but perhaps that's not strictly necessary. ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
