[jira] [Comment Edited] (PDFBOX-5528) PDF/UA: Add marked content sections when flattening acro forms

Andre Wachsmuth (Jira) Wed, 19 Oct 2022 10:43:07 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620477#comment-17620477
 ]


Andre Wachsmuth edited comment on PDFBOX-5528 at 10/19/22 5:42 PM:
-------------------------------------------------------------------

Ah yes, the MCID is a bit troublesome. I'm by no means an expert on the PDF 
spec, but as far as I understand, the exact value of the MCID does not matter 
much as long as it unique? What I did for now is use the PDFStreamParser to 
collect all existing MCIDs from the document, then I use MCIDs that are 
different from those.

> And I guess we'd also have to add it in the resources an/or in the structure 
> tree.

That's why I'm hoping this can be included in pdfbox to handle this properly, 
since I'm not that familiar with the PDF spec.

 


was (Author: JIRAUSER297068):
Ah yes, the MCID is a bit troublesome. I'm by no means an expert on the PDF 
spec, but as far as I understand, the exact value of the MCID does not matter 
much as long as it unique. What I did for now is use the PDFStreamParser to 
collect all existing MCIDs from the document, then I use MCIDs that are 
different from those.

> And I guess we'd also have to add it in the resources an/or in the structure 
> tree.

That's why I'm hoping this can be included in pdfbox to handle this properly, 
since I'm not that familiar with the PDF spec.

 

> PDF/UA: Add marked content sections when flattening acro forms
> --------------------------------------------------------------
>
>                 Key: PDFBOX-5528
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5528
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: AcroForm
>            Reporter: Andre Wachsmuth
>            Priority: Minor
>         Attachments: correct.png, wrong.png
>
>
> We need to support PDF/UA compliant documents to some extent. I noticed that 
> when we take a PDF/UA compliant PDF document and flatten it via 
> PDAcroForm#flatten, the resulting output is not PDF/UA compliant anymore.
> After a little bit of research, the problem is that PDFBox creates /DO 
> operators with paths representing the appearance of the form fields. 
> According to the PDF/UA standard, such paths need to be enclosed in marked 
> content sections (BMC ... EMC, BDC ... EMC, see attached images)
> By copying some code from AcroForm#flatten and adding 
> contentStream.beginMarkedContent and contentStream.endMarkedContent myself, I 
> can workaround the problem, but that's less than ideal, it would be great if 
> this could be included in PDFBox.
>  
> {code:java}
> public void flatten(List<PDField> fields, boolean refreshAppearances) throws 
> IOException
>   // ...
>            final var dict = new COSDictionary();
>            dict.setLong(COSName.MCID, mcid);
>            dict.setItem(COSName.BBOX, bBox);
>            dict.setItem(COSName.TYPE, COSName.BACKGROUND);
>             final var propList = PDPropertyList.create(dict);
>             contentStream.beginMarkedContent(COSName.ARTIFACT, propList);
>             contentStream.saveGraphicsState();
>             // see https://stackoverflow.com/a/54091766/1729265 for an 
> explanation
>             // of the steps required
>             // this will transform the appearance stream form object into the 
> rectangle of the
>             // annotation bbox and map the coordinate systems
>             final var transformationMatrix = 
> pdfbox_resolveTransformationMatrix(form, annotation, appearanceStream);
>             contentStream.transform(transformationMatrix);
>             contentStream.drawForm(fieldObject);
>             contentStream.restoreGraphicsState();
>             contentStream.endMarkedContent();
>  
>   // ...
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (PDFBOX-5528) PDF/UA: Add marked content sections when flattening acro forms

Reply via email to