https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6439
--- Comment #24 from Kent Oyer <kent.o...@gmail.com> --- I had high hopes that using extracttext.pm would be an easy fix but I've discovered three main problems with this method: 1. extracttext.pm stores it's output in the rendered body part via a call to `set_rendered`. This is great if you are extracting text from a PDF or image file. However, if you are using `cat` on an HTML part, you will end up with raw HTML in your rendered body. So really you need to use a tool like `html2text` that can render the HTML and extract the visible text so your body rules work correctly. That's no big deal, however... 2. Unfortunately, rawbody rules will not work at all because rawbody rules run against text as returned from `get_decoded_body_text_array`. That function only returns the contents of text/* and message/* parts. 3. Lastly, extracttext.pm doesn't add discovered URI's to the URI detail list. I think the goal is to treat all HTML attachments the same, regardless of the MIME-type. I've attached a patch file that does just that and also includes a new test case. Let me know if you have any questions. Respectfully, Kent -- You are receiving this mail because: You are the assignee for the bug.