Everything written there is wrong.

SA Bayes uses $pms->get_decoded_stripped_body_text_array(), which returns
the text that is supposed to be displayed to user / MUAs, with text/html
part rendered to text if exists.

So use the stripped function, unless your engine handles mime multipart,
HTML rendering etc.

get_decoded_stripped_body_text_array is what 'body' rules process
get_decoded_body_text_array is what 'rawbody' rules process

I've written detailed info about the rule types here:
https://cwiki.apache.org/confluence/display/spamassassin/WritingRulesAdvanced

The PerMsgStatus docs are quite poor in this regard, I tried to described a
bit more in current SVN versions..

Cheers,
Henrik

On Wed, Aug 21, 2019 at 03:12:22AM +0530, Shreyansh Shrivastava. wrote:
> Hey Kris, 
> Thanks for the pointer. Will try to accommodate both the sections.
> 
> Also, I found the answer. $pms->get_decoded_body_text_array() returns an array
> of strings where each string represented one newline-separated line of the
> body. Also since the newline gets converted into <br> int text/html, the whole
> text/html part becomes the last element of the array. Using pop() on the array
> will leave you with only the text/plain part.
> 
> Thanks,
> Shreyansh Shrivastava
> 
> 
> On Wed, Aug 21, 2019 at 3:06 AM Kris Deugau <[1][email protected]> wrote:
> 
>     Shreyansh Shrivastava. wrote:
>     > I wanted to process only the text/plain part of the mail hence I was
>     > looking for a sub in SA. The closest I could get was
>     > $pms->get_decoded_body_text_array () which returns an array of strings
>     > comprising both text/plain and text/html part of the mail.
>     >
>     > Is there any other way of retrieving the text/plain part only?
> 
>     I can't really answer what you're asking, but I will point out that the
>     text/plain part is often empty or at least different from the text/html
>     part - on both spam and ham.  Looking only at the text/html would be
>     slightly better, but using both would be better still.
> 
>     The HTML formatting/structure itself is often valuable for spam signs
>     too, on top of whatever readable text content it contains.
> 
>     -kgd
> 
> 
> References:
> 
> [1] mailto:[email protected]

Reply via email to