srujana-kuntumalla opened a new pull request, #2892: URL: https://github.com/apache/tika/pull/2892
## Summary - Adds `PDF.HAS_SIGNATURE_FIELDS` (`pdf:hasSignatureFields`) metadata property to report the presence of AcroForm `/FT /Sig` fields, regardless of whether a signature has been applied - Refactors `PDFParser.extractSignatures()` to iterate over `PDDocument.getSignatureFields()` instead of `getSignatureDictionaries()`, so unsigned signature fields are detected; `TikaCoreProperties.HAS_SIGNATURE` is still only set when a field has an actual `PDSignature` applied - Adds a minimal test PDF (`testPDF_unsigned_sig_field.pdf`) with an unsigned signature field and `/SigFlags 3` - Updates `testSignatureInAcroForm` to assert the new property and tightens expectations; adds `testUnsignedSignatureField` for TIKA-4756 ## Motivation PDFs that contain AcroForm signature fields (`/FT /Sig`) but have not yet been signed were previously indistinguishable from PDFs with no signature infrastructure. This matters for downstream applications such as PDF/A converters (e.g. OCRmyPDF) that need to skip documents with signature fields to avoid invalidating future signatures. ## Test plan - [x] `PDFParserTest#testUnsignedSignatureField` — new test asserting `pdf:hasSignatureFields=true` and no `HAS_SIGNATURE` on a PDF with unsigned sig field - [x] `PDFParserTest#testSignatureInAcroForm` — existing test updated to assert the new property; confirms no actual signature is set on `testPDF_acroform3.pdf` Fixes: https://issues.apache.org/jira/browse/TIKA-4756 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
