[jira] [Commented] (PDFBOX-4066) Merging documents with nested fields duplicates child fields

Maruan Sahyoun (JIRA) Wed, 17 Jan 2018 03:08:08 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328622#comment-16328622
 ]


Maruan Sahyoun commented on PDFBOX-4066:
----------------------------------------

[~mkl] very kind to provide some feedback on that.
There are different strategies for merging AcroForm fields. PDFBox always (up 
to now) considered additional documents having the same form fields (at root 
level or childs) to be treated as new fields. That is different to how Adobe 
handles that situation where when merging fields with the same fully qualified 
name (and the same type) they are merged i.e. the to be merged field is treated 
as a new occurrence of the same field and as a result the annotations are 
merged into the same field entry.

To me Adobe strategy makes more sense but there was discussion in another 
ticket (I've forgotten the issue number) where the user didn't want them to be 
combined.

So for now I think the fix is OK  as it is in line with the current behavior of 
PDFBox. But we should consider changing that for the 3.x release. WDYT? 

> Merging documents with nested fields duplicates child fields
> ------------------------------------------------------------
>
>                 Key: PDFBOX-4066
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4066
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm, Utilities
>    Affects Versions: 2.0.8
>            Reporter: Al Phaba
>            Assignee: Maruan Sahyoun
>            Priority: Major
>             Fix For: 2.0.9, 3.0.0 PDFBox
>
>         Attachments: TestForm-flattened.pdf, TestForm-merged.pdf, 
> TestForm.pdf, flattenAndMerge.pdf
>
>
> I have a pdf with a lot of acroforms, I do some manipulation on it which 
> results in a new pdf. So I have PDF-1 (which is the original one )and PDF-2 
> (just a duplication of PDF-1), now I want to merge them. Both PDFs have some 
> acroforms for example: field_a, field_2...
> Before I merge them I flatten PDF-1, because I only want to have acrofields 
> from PDF-2. When I check then my new merged PDF I can see that there are no 
> visible fields on on the pages from PDF-1 and there are fields on pages of 
> fields of PDF-2. At the first look it seems ok, but when I inspect the fields 
> I can see that the merger has renamed all the fields for PDF-2 e.g. 
> field_a_dummy123, field_b_dummy232 ...
> It seems to me, that flattening does not remove the fields and thats why the 
> PDFMerger from PDFBox will rename the fields for PDF-2 because acrofields 
> need to be unique.Another guess was that there is a bug in mergeAcroForm()
>  
> {code:java}
> @Test
> public void flattenAndMerge() throws IOException {
>     File testForm = new 
> File(classLoader.getResource("./TestForm.pdf").getFile());
>     byte[] testFormAsByte = Files.readAllBytes(testForm.toPath());
>     byte[] testFormAsByte2 = Files.readAllBytes(testForm.toPath());
>     PDDocument pdf1 = PDDocument.load(testFormAsByte);
>     PDAcroForm acroform = pdf1.getDocumentCatalog().getAcroForm();
>     acroform.flatten();
>     Path flattendedPdf = Files.createTempFile("flatten", ".pdf");
>     pdf1.save(flattendedPdf.toFile());
>     PDFMergerUtility merger = new PDFMergerUtility();
>     merger.addSource(new 
> ByteArrayInputStream(Files.readAllBytes(flattendedPdf)));
>     merger.addSource(new ByteArrayInputStream(testFormAsByte2));
>     merger.setDestinationFileName("./build/flattenAndMerge.pdf");
>     merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
> }
> {code}
> Here is my SO Article
> [https://stackoverflow.com/questions/48271924/pdfbox-flatten-pdf-does-not-remove-acroform-elements?noredirect=1#comment83544858_48271924]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4066) Merging documents with nested fields duplicates child fields

Reply via email to