https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8107

--- Comment #3 from Kent Oyer <kent.o...@gmail.com> ---
I decided to make a new plugin because it would be too difficult to maintain
backward compatibility with the old plugin. The existing PDFInfo plugin was a
spinoff of the ImageInfo plugin which probably explains why it focuses so much
on image dimensions and pixel area.

I've made the plugin available on GitHub in case anyone wants to use it.

https://github.com/mxguardian/Mail-SpamAssassin-Plugin-PDFInfo2

Feedback and suggestions are appreciated. I'm using this in production without
any problems but all the standard warnings and disclaimers apply. You can run
this plugin in parallel with the old plugin in case you are using any rules
that depend on the old plugin. 

Notable improvements:

* It can parse PDF's that are encrypted with a blank password

* Several of the tests focus exclusively on page 1 of each document. This not
only helps with performance but is a countermeasure against content stuffing

* pdf2_click_ratio - Fires based on how much of page 1 is clickable. Based on
preliminary testing, anything over 20% is likely spam, especially if there's
only one link and the word count is low. 

* I took the liberty of creating a new "pdf" URI type that can be used in
writing uri-detail rules.

Let me know if you have any questions.

-Kent

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to