[
https://issues.apache.org/jira/browse/PDFBOX-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620914#comment-17620914
]
Andre Wachsmuth commented on PDFBOX-5528:
-----------------------------------------
Sadly many customers only care whether the software produces a result that
passes automated tests. What kind of information would be required for the
flattened form fields, can it be extracted from the existing AcroFrom? Or
perhaps an argument could be added to #flatten that lets us pass the required
information?
> PDF/UA: Add marked content sections when flattening acro forms
> --------------------------------------------------------------
>
> Key: PDFBOX-5528
> URL: https://issues.apache.org/jira/browse/PDFBOX-5528
> Project: PDFBox
> Issue Type: Improvement
> Components: AcroForm
> Reporter: Andre Wachsmuth
> Priority: Minor
> Attachments: correct.png, wrong.png
>
>
> We need to support PDF/UA compliant documents to some extent. I noticed that
> when we take a PDF/UA compliant PDF document and flatten it via
> PDAcroForm#flatten, the resulting output is not PDF/UA compliant anymore.
> After a little bit of research, the problem is that PDFBox creates /DO
> operators with paths representing the appearance of the form fields.
> According to the PDF/UA standard, such paths need to be enclosed in marked
> content sections (BMC ... EMC, BDC ... EMC, see attached images)
> By copying some code from AcroForm#flatten and adding
> contentStream.beginMarkedContent and contentStream.endMarkedContent myself, I
> can workaround the problem, but that's less than ideal, it would be great if
> this could be included in PDFBox.
>
> {code:java}
> public void flatten(List<PDField> fields, boolean refreshAppearances) throws
> IOException
> // ...
> final var dict = new COSDictionary();
> dict.setLong(COSName.MCID, mcid);
> dict.setItem(COSName.BBOX, bBox);
> dict.setItem(COSName.TYPE, COSName.BACKGROUND);
> final var propList = PDPropertyList.create(dict);
> contentStream.beginMarkedContent(COSName.ARTIFACT, propList);
> contentStream.saveGraphicsState();
> // see https://stackoverflow.com/a/54091766/1729265 for an
> explanation
> // of the steps required
> // this will transform the appearance stream form object into the
> rectangle of the
> // annotation bbox and map the coordinate systems
> final var transformationMatrix =
> pdfbox_resolveTransformationMatrix(form, annotation, appearanceStream);
> contentStream.transform(transformationMatrix);
> contentStream.drawForm(fieldObject);
> contentStream.restoreGraphicsState();
> contentStream.endMarkedContent();
>
> // ...
> }{code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]