[jira] [Commented] (PDFBOX-4475) PDFMergerUtility is very slow, almost in dead loop

Tilman Hausherr (JIRA) Wed, 27 Feb 2019 08:49:32 -0800


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16779497#comment-16779497
 ]


Tilman Hausherr commented on PDFBOX-4475:
-----------------------------------------

Thanks, your screenshot points to a problem: 
{{clonedVersion.containsValue(base)}} is slow, the javadoc tells that _This 
operation will probably require time linear in the map size for most 
implementations of the Map interface_. I have fixed that. This is too late for 
the 2.0.14, but you can copy the source code of the merge and the clone class 
and keep these separate until 2.0.15.

The target file is still very large, this is because we don't support 
compressed object streams. If this is important for you, then postprocess your 
file with a utility like qpdf.

Do you need the structure tree? If not, then you can remove it. This is for 
tagged PDFs. The cloning of the structure tree was introduced because people 
were unhappy that after merging, one couldn't close the source file (because 
the destination file was referencing source streams).

You can't remove the "Don't clone a clone" segment. 7 build tests will fail: 
the final PDF ends up with pages that are not in the page tree, these are pages 
that were cloned twice or even more. This results in larger files and potential 
mayhem.

> PDFMergerUtility is very slow, almost in dead loop
> --------------------------------------------------
>
>                 Key: PDFBOX-4475
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4475
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Derek Liu
>            Priority: Critical
>         Attachments: TourInventory_TourInventory_report.pdf, 
> image-2019-02-27-15-43-42-304.png, screenshot-1.png
>
>
> When use PDFMergerUtility to merge PDF file, it is very slow to merge the 
> struct tree. It seems that it in a dead loop.
>  !image-2019-02-27-15-43-42-304.png! 
> Test code:
> {code}
> package com.test;
> import java.io.IOException;
> import org.apache.pdfbox.multipdf.PDFMergerUtility;
> public class TestMergeUtil {
>   public static void main( String[] args ) throws IOException {
>     PDFMergerUtility merger = new PDFMergerUtility();
>     merger.addSource( "D:\\probe\\TourInventory_TourInventory_report.PDF" );
>     merger.addSource( "D:\\probe\\TourInventory_TourInventory_report.PDF" );
>     merger.setDestinationFileName( 
> "D:\\probe\\TourInventory_TourInventory_report_merged.pdf" );
>     merger.mergeDocuments();
>   }
> }
> {code}
> But after I comment bellow code, it can merge fast, but the merged PDF file 
> size is very large.
>  !screenshot-1.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4475) PDFMergerUtility is very slow, almost in dead loop

Reply via email to