[
https://issues.apache.org/jira/browse/TIKA-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18088454#comment-18088454
]
Willy T. Koch commented on TIKA-4756:
-------------------------------------
Thanks for the insight!
I tested PDFbox and got the navigator to view the file, but it has some OS font
issue so it's useless. I used Claude Code on it instead which did the same
analysis with a python script it made.
It confirms what you both say, all signature fields are empty so in that sense
it's correct that Tika reports hasSignature = false.
A useful addition to flag these signature fields could be to have a property
signature:containsFields:true or something similar.
But as you've spent plenty of time, you can also decide to just close it and
we'll add some custom code to flag these, as first suggested.
h3. /Sig Fields — 7 total, all unsigned
||#||Field name||Status||
|1|{{2 - E-Signature of applicant}}|Empty (not signed)|
|2|{{6 - 8 E-Signature Head of Training type rating or NPCT extension to
SPO}}|Empty|
|3|{{9 - 10 E-signature}}|Empty|
|4|{{12 - 3 E-Signature of applicant}}|Empty|
|5|{{13 - 3 E-signature of TRI}}|Empty|
|6|{{16 - 6 E-Signature}}|Empty|
|7|{{17 - 3 E-signature}}|Empty|
*None of the signature fields have been signed.* Each {{/V}} entry is absent —
the fields are present in the AcroForm structure but contain no cryptographic
signature data (no {{{}/Filter{}}}, {{{}/SubFilter{}}}, {{{}/ByteRange{}}},
{{{}/Contents{}}}, etc.).
On Page 4 in the signature field with the blue inserted signature, it's just an
inserted image.
*{{9 - 10 E-signature}} ({{{}/FT=/Sig{}}}, {{{}/V=None{}}})* — this is the
designated cryptographic e-signature field for the examiner, and it is
{*}empty/unsigned{*}, exactly like all the other {{/Sig}} fields.
In short: *yes, it's just an embedded image* — a picture of a handwritten
signature baked into the page content, with no cryptographic binding
whatsoever. The actual {{/Sig}} field {{9 - 10 E-signature}} sitting right next
to it in the same row is still unsigned. The image carries zero legal/technical
signature value from a PDF signature standpoint.
> Detecting Signatures in PDFs with AcroForm
> ------------------------------------------
>
> Key: TIKA-4756
> URL: https://issues.apache.org/jira/browse/TIKA-4756
> Project: Tika
> Issue Type: Improvement
> Components: metadata
> Reporter: Willy T. Koch
> Priority: Minor
> Labels: Signature
> Attachments: image-2026-06-11-18-05-01-275.png, sigflags_sample.pdf,
> signature.png
>
>
> We see that PDFs that have an Acroform that contains a signture /Sig fields
> aren't detected by the /meta analysis. It detects the AcroForm with
> "pdf:hasAcroFormFields": "true", but nothing on the /Sig part. They are
> created directly in Adobe Acrobat which is also possible in the Free version.
> It would be very useful to also return "hasSignature": "true" (or some
> other signature: property) in these kinds of filees, so we can handle it on
> our end. We use this to exluce PDFs with digital signatures from being
> reconverted to PDF/A.
>
> When I run it through the OCRmyPDF, it flags it as digitally signed and
> exits, which is how I first noticed it.
> _ocrmypdf sigflags_sample.pdf sigflags_sample_ocrmypdf.pdf_
> _DigitalSignatureError: Input PDF has a digital signature. OCR would alter
> the document,_
> _invalidating the signature._
>
> I've attached a small sample PDF with AcroForm and Signature to reproduce the
> issue.
>
> Willy T. Koch
> Technical Product manager,
> Public 360°
> Norway
--
This message was sent by Atlassian Jira
(v8.20.10#820010)