[Bug 7579] PDFInfo: pdfinfo:pdf_has_uri

bugzilla-daemon Thu, 03 May 2018 23:40:25 -0700

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7579


Giovanni Bechis <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]

--- Comment #3 from Giovanni Bechis <[email protected]> ---
(In reply to John Hardin from comment #1)
> That a PDF has a URI (clickable or not) doesn't seem a terribly useful datum
> in isolation. I'd suggest it would be _much_ more useful to extract the URIs
> and add them to the pool that feeds uri rules and URIBL checks.
> 
any hints on how to add uris to the pool ?
I had a look at DecodeShortURLSs.pm but it's ugly and I am not sure it works
correctly

> Even better if heuristics similar to what's used for body text would pull
> non-clickable URIs out of the PDF text, but doing that might best be
> controlled by a config option.
IMHO this should be a second step

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7579] PDFInfo: pdfinfo:pdf_has_uri

Reply via email to