[
https://issues.apache.org/jira/browse/PDFBOX-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620899#comment-17620899
]
Michael Klink commented on PDFBOX-5528:
---------------------------------------
Well, when flattening form fields into the static content and trying to
integrate them into the existing structure tree, one strictly speaking would
need to know more details of how the flattened content _semantically_ fits in.
At least for a good tagging result one needs that, for a tagging result passing
automated tests that's not necessary, but human users of the document
accessibility may well complain.
> PDF/UA: Add marked content sections when flattening acro forms
> --------------------------------------------------------------
>
> Key: PDFBOX-5528
> URL: https://issues.apache.org/jira/browse/PDFBOX-5528
> Project: PDFBox
> Issue Type: Improvement
> Components: AcroForm
> Reporter: Andre Wachsmuth
> Priority: Minor
> Attachments: correct.png, wrong.png
>
>
> We need to support PDF/UA compliant documents to some extent. I noticed that
> when we take a PDF/UA compliant PDF document and flatten it via
> PDAcroForm#flatten, the resulting output is not PDF/UA compliant anymore.
> After a little bit of research, the problem is that PDFBox creates /DO
> operators with paths representing the appearance of the form fields.
> According to the PDF/UA standard, such paths need to be enclosed in marked
> content sections (BMC ... EMC, BDC ... EMC, see attached images)
> By copying some code from AcroForm#flatten and adding
> contentStream.beginMarkedContent and contentStream.endMarkedContent myself, I
> can workaround the problem, but that's less than ideal, it would be great if
> this could be included in PDFBox.
>
> {code:java}
> public void flatten(List<PDField> fields, boolean refreshAppearances) throws
> IOException
> // ...
> final var dict = new COSDictionary();
> dict.setLong(COSName.MCID, mcid);
> dict.setItem(COSName.BBOX, bBox);
> dict.setItem(COSName.TYPE, COSName.BACKGROUND);
> final var propList = PDPropertyList.create(dict);
> contentStream.beginMarkedContent(COSName.ARTIFACT, propList);
> contentStream.saveGraphicsState();
> // see https://stackoverflow.com/a/54091766/1729265 for an
> explanation
> // of the steps required
> // this will transform the appearance stream form object into the
> rectangle of the
> // annotation bbox and map the coordinate systems
> final var transformationMatrix =
> pdfbox_resolveTransformationMatrix(form, annotation, appearanceStream);
> contentStream.transform(transformationMatrix);
> contentStream.drawForm(fieldObject);
> contentStream.restoreGraphicsState();
> contentStream.endMarkedContent();
>
> // ...
> }{code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]