[
https://issues.apache.org/jira/browse/PDFBOX-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012668#comment-18012668
]
Lonzak commented on PDFBOX-5528:
--------------------------------
+1 for this issue...
> PDF/UA: Add marked content sections when flattening acro forms
> --------------------------------------------------------------
>
> Key: PDFBOX-5528
> URL: https://issues.apache.org/jira/browse/PDFBOX-5528
> Project: PDFBox
> Issue Type: Improvement
> Components: AcroForm
> Reporter: Andre Wachsmuth
> Priority: Minor
> Attachments: correct.png, wrong.png
>
>
> We need to support PDF/UA compliant documents to some extent. I noticed that
> when we take a PDF/UA compliant PDF document and flatten it via
> PDAcroForm#flatten, the resulting output is not PDF/UA compliant anymore.
> After a little bit of research, the problem is that PDFBox creates /DO
> operators with paths representing the appearance of the form fields.
> According to the PDF/UA standard, such paths need to be enclosed in marked
> content sections (BMC ... EMC, BDC ... EMC, see attached images)
> By copying some code from AcroForm#flatten and adding
> contentStream.beginMarkedContent and contentStream.endMarkedContent myself, I
> can workaround the problem, but that's less than ideal, it would be great if
> this could be included in PDFBox.
>
> {code:java}
> public void flatten(List<PDField> fields, boolean refreshAppearances) throws
> IOException
> // ...
> final var dict = new COSDictionary();
> dict.setLong(COSName.MCID, mcid);
> dict.setItem(COSName.BBOX, bBox);
> dict.setItem(COSName.TYPE, COSName.BACKGROUND);
> final var propList = PDPropertyList.create(dict);
> contentStream.beginMarkedContent(COSName.ARTIFACT, propList);
> contentStream.saveGraphicsState();
> // see https://stackoverflow.com/a/54091766/1729265 for an
> explanation
> // of the steps required
> // this will transform the appearance stream form object into the
> rectangle of the
> // annotation bbox and map the coordinate systems
> final var transformationMatrix =
> pdfbox_resolveTransformationMatrix(form, annotation, appearanceStream);
> contentStream.transform(transformationMatrix);
> contentStream.drawForm(fieldObject);
> contentStream.restoreGraphicsState();
> contentStream.endMarkedContent();
>
> // ...
> }{code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]