[Bug 7579] PDFInfo: pdfinfo:pdf_has_uri

bugzilla-daemon Tue, 13 Apr 2021 13:50:01 -0700

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7579


--- Comment #14 from Giovanni Bechis <[email protected]> ---
(In reply to Henrik Krohns from comment #12)
> (In reply to Giovanni Bechis from comment #7)
> > 
> > Extract URIs from pdf files (at least some of them) and add them to the pool
> > of URIs to be checked (URIBL, etc...).
> 
> We have ExtractText.pm too, so which is better tool for the job? How will we
> manage things in future when we have 10 plugins all adding some metadata? Do
> we actually want "uri" or URIBL to match _anything_ and how do we manage on
> per-rule basis which sources should be used?

IMHO ExtractText.pm is more ocr oriented and it covers more then just pdf
files, PDFInfo.pm is more about attached pdf file names and other info strictly
related to pdf, maybe they could be merged but I do not think it's worth the
effort.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7579] PDFInfo: pdfinfo:pdf_has_uri

Reply via email to