Willy T. Koch created TIKA-4756:
-----------------------------------
Summary: Detecting Signatures in PDFs with AcroForm
Key: TIKA-4756
URL: https://issues.apache.org/jira/browse/TIKA-4756
Project: Tika
Issue Type: Improvement
Components: metadata
Reporter: Willy T. Koch
Attachments: sigflags_sample.pdf
We see that PDFs that have an Acroform that contains a signture /Sig fields
aren't detected by the /meta analysis. It detects the AcroForm with
"pdf:hasAcroFormFields": "true", but nothing on the /Sig part. They are created
directly in Adobe Acrobat which is also possible in the Free version.
It would be very useful to also return "hasSignature": "true" (or some other
signature: property) in these kinds of filees, so we can handle it on our end.
We use this to exluce PDFs with digital signatures from being reconverted to
PDF/A.
When I run it through the OCRmyPDF, it flags it as digitally signed and exits,
which is how I first noticed it.
_ocrmypdf sigflags_sample.pdf sigflags_sample_ocrmypdf.pdf_
_DigitalSignatureError: Input PDF has a digital signature. OCR would alter the
document,_
_invalidating the signature._
I've attached a small sample PDF with AcroForm and Signature to reproduce the
issue.
Willy T. Koch
Technical Product manager,
Public 360°
Norway
--
This message was sent by Atlassian Jira
(v8.20.10#820010)