[jira] [Comment Edited] (PDFBOX-4066) Merging documents with nested fields duplicates child fields

Michael Klink (JIRA) Wed, 17 Jan 2018 02:55:21 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328611#comment-16328611
 ]


Michael Klink edited comment on PDFBOX-4066 at 1/17/18 10:54 AM:
-----------------------------------------------------------------

To make things a bit more complicated... ;)

Shouldn't a complete solution in case of a duplicate non-terminal root field 
check whether the entries other than the child fields are identical and in that 
case consider these top level fields merged and continue inspecting the child 
fields? If the child fields have distinct names, then all children of either 
duplicate can simply become children of the merged field.

In case of duplicate non-terminal child fields the same consideration can take 
place as for the duplicate non-terminal root field.

Even in case of duplicate terminal fields with identical entries other than the 
widgets one can consider merging them as multiple widgets of the same field. 
This should be optional, though, as this might not be wanted.

Only in the case of duplicate non-terminal or terminal fields with incompatible 
entries one of the duplicates needs to be renamed...

Ok, not trivial... ;)

 


was (Author: mkl):
To make things a bit more complicated... ;)

Shouldn't a complete solution in case of a duplicate non-terminal root field 
check whether the entries other than the child fields are identical and in that 
case consider these top level fields merged and continue inspecting the child 
fields? If the child fields have distinct names, then all child can simply be 
merged.

In case of duplicate non-terminal child fields the same consideration can take 
place as for the duplicate non-terminal root field.

Even in case of duplicate terminal fields with identical entries other than the 
widgets one can consider merging them as multiple widgets of the same field. 
This should be optional, though, as this might not be wanted.

Only in the case of duplicate non-terminal or terminal fields with incompatible 
entries one of the duplicates needs to be renamed...

Ok, not trivial... ;)

 

> Merging documents with nested fields duplicates child fields
> ------------------------------------------------------------
>
>                 Key: PDFBOX-4066
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4066
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm, Utilities
>    Affects Versions: 2.0.8
>            Reporter: Al Phaba
>            Assignee: Maruan Sahyoun
>            Priority: Major
>             Fix For: 2.0.9, 3.0.0 PDFBox
>
>         Attachments: TestForm-flattened.pdf, TestForm-merged.pdf, 
> TestForm.pdf, flattenAndMerge.pdf
>
>
> I have a pdf with a lot of acroforms, I do some manipulation on it which 
> results in a new pdf. So I have PDF-1 (which is the original one )and PDF-2 
> (just a duplication of PDF-1), now I want to merge them. Both PDFs have some 
> acroforms for example: field_a, field_2...
> Before I merge them I flatten PDF-1, because I only want to have acrofields 
> from PDF-2. When I check then my new merged PDF I can see that there are no 
> visible fields on on the pages from PDF-1 and there are fields on pages of 
> fields of PDF-2. At the first look it seems ok, but when I inspect the fields 
> I can see that the merger has renamed all the fields for PDF-2 e.g. 
> field_a_dummy123, field_b_dummy232 ...
> It seems to me, that flattening does not remove the fields and thats why the 
> PDFMerger from PDFBox will rename the fields for PDF-2 because acrofields 
> need to be unique.Another guess was that there is a bug in mergeAcroForm()
>  
> {code:java}
> @Test
> public void flattenAndMerge() throws IOException {
>     File testForm = new 
> File(classLoader.getResource("./TestForm.pdf").getFile());
>     byte[] testFormAsByte = Files.readAllBytes(testForm.toPath());
>     byte[] testFormAsByte2 = Files.readAllBytes(testForm.toPath());
>     PDDocument pdf1 = PDDocument.load(testFormAsByte);
>     PDAcroForm acroform = pdf1.getDocumentCatalog().getAcroForm();
>     acroform.flatten();
>     Path flattendedPdf = Files.createTempFile("flatten", ".pdf");
>     pdf1.save(flattendedPdf.toFile());
>     PDFMergerUtility merger = new PDFMergerUtility();
>     merger.addSource(new 
> ByteArrayInputStream(Files.readAllBytes(flattendedPdf)));
>     merger.addSource(new ByteArrayInputStream(testFormAsByte2));
>     merger.setDestinationFileName("./build/flattenAndMerge.pdf");
>     merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
> }
> {code}
> Here is my SO Article
> [https://stackoverflow.com/questions/48271924/pdfbox-flatten-pdf-does-not-remove-acroform-elements?noredirect=1#comment83544858_48271924]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (PDFBOX-4066) Merging documents with nested fields duplicates child fields

Reply via email to