https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7579
--- Comment #13 from Henrik Krohns <[email protected]> --- Let's say some large PDF has a hundred unique "uris" for one reason or another. How would we manage this? Should we prefer to URIBL query them instead of body uris? Or shuffle and take n-amount of uris from here and there? How will different __URI* rules react, which depend on count / number of hits? I'm quite sceptical that even ExtractText makes any sense. It has the same problems, along with possibly filling Bayes with semi-random stuff from badly OCR'd images or wonky rendered PDF's etc. I think would just vote to have a pdf_has_uri() which can match uris from PDFs and that's it. No complex metadata hassles. -- You are receiving this mail because: You are the assignee for the bug.
