[ https://issues.apache.org/jira/browse/PDFBOX-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841939#comment-17841939 ]
Marcus Korinth commented on PDFBOX-5809: ---------------------------------------- [~tilman] Thank you for your in depth analysis. Our splitter cleans the mentioned remains and far more than the on-board splitter does after splitting the original document. The clean-up is done after in order to avoid changing the original file accidentally - although it might be clever to temporarily change the original document (or to create a copy) before splitting it. Any reason why it was so much faster in the 2.X versions? - I also thought it was fast in 3.0.1. Hmm Is this likely to get fixed with the release of 3.0.3? Thanks for your effort! > PDDocument#importPage slowed down by factor 1300 > ------------------------------------------------ > > Key: PDFBOX-5809 > URL: https://issues.apache.org/jira/browse/PDFBOX-5809 > Project: PDFBox > Issue Type: Bug > Components: Utilities > Affects Versions: 2.0.31, 3.0.2 PDFBox > Reporter: Marcus Korinth > Priority: Major > Fix For: 2.0.32, 4.0.0, 3.0.3 PDFBox > > Attachments: image-2024-04-27-18-50-19-199.png > > > We are using the *PDDocument#importPage* Method in our own splitter where we > split pages from a _SourceDocument_ to a _TargetDocument_. In order to do so > we first extract the page by using the following code: > {code:java} > final PDPage sourcePage = sourceDocument.getPage(pageNumber); > {code} > Immediatly afterwards we are calling: > {code:java} > final PDPage targetPage = targetDocument.importPage(sourcePage); > {code} > This approach worked just fine with *pdfbox 2.0.26*. > We decided to upgrade to version *3.0.2* since it takles a lot of the > problems. > Unfortunately the *PDDocument#importPage* method slowed down by around 1300 > times. In Version *2.0.26* it took 15ms in an average. With the latest > *3.0.2* it takes 20000 ms in average. That is a huge deal breaker as we > usually have to split documents which have several thousand pages. > Note: The same applies when using *PDDocument#addPage*. > Note: The problem does not appear in *3.0.1*. But we can't use that since it > has other major problems which breaks our application. > I have prepared an example document with which you can replicate the issue. > Due to the file size limitation I had to prepare a WeTransfer-Link for you: > https://we.tl/t-lfN2wz7cAs -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org