Hi.
I think I've found the change that created a regression for me. My code
uses only addPage; nowhere do I import from another document. Usually, I
create a new PDPage object, add a content stream, and clone resources when
I reuse data from another document. And one difference in the code is where
we reset the object keys. So, a fix that would solve my problem and still
fix PDFBox-5752 would be to move the reset logic.
-------------------------------------------
public void addPage(PDPage page)
{
page.getCOSObject().resetImportedObjectKeys();
getPages().add(page);
}
-------------------------------------------
Works for me. Instead of having it before the addPage call in the
importPage function.
-------------------------------------------
public PDPage importPage(PDPage page) throws IOException
{
PDPage importedPage = new PDPage(new
COSDictionary(page.getCOSObject()), resourceCache);
importedPage.getCOSObject().removeItem(COSName.PARENT);
PDStream dest = new PDStream(this, page.getContents(),
COSName.FLATE_DECODE);
importedPage.setContents(dest);
// reset imported object keys to avoid overlapping object numbers
importedPage.getCOSObject().resetImportedObjectKeys();
addPage(importedPage);
importedPage.setCropBox(new
PDRectangle(page.getCropBox().getCOSArray()));
importedPage.setMediaBox(new
PDRectangle(page.getMediaBox().getCOSArray()));
importedPage.setRotation(page.getRotation());
if (page.getResources() != null &&
!page.getCOSObject().containsKey(COSName.RESOURCES))
{
LOG.warn("inherited resources of source document are not
imported to destination page");
LOG.warn("call importedPage.setResources(page.getResources())
to do this");
}
return importedPage;
}
-------------------------------------------
Maybe this change was intentional. But it will at least break code like
------------------------------------------------
PDPage newPage = new PDPage();
COSBase base =
cloneUtility.cloneForNewDocument(page.getResources());
newPage.setResources(new PDResources((COSDictionary) base));
newPage.setMediaBox(page.getMediaBox());
newPage.setCropBox(page.getCropBox());
newPage.setTrimBox(page.getTrimBox());
newPage.setRotation(page.getRotation());
List<PDAnnotation> list = new ArrayList<>();
for(PDAnnotation annotation : page.getAnnotations()) {
COSBase cloned =
cloneUtility.cloneForNewDocument(annotation);
list.add(PDAnnotation.createAnnotation(cloned));
}
if(list.size() > 0) {
newPage.setAnnotations(list);
}
List<PDStream> newStream = new ArrayList<>();
Iterator<PDStream> it = page.getContentStreams();
while (it.hasNext()) {
PDStream stream = it.next();
newStream.add(stream);
}
newPage.setContents(newStream);
newDoc.addPage(newPage);
------------------------------------------------
Best regards
Daniel
On Wed, Feb 4, 2026 at 7:19 PM Andreas Lehmkühler <[email protected]> wrote:
> Are you using the import page feature as the mentioned commit fixes an
> issue when importing pages containing objects with overlapping object
> numbers. Other scenarios are most likely not affected.
>
>
> Am 04.02.26 um 13:19 schrieb Daniel Persson:
> > Hi Andreas
> >
> > You are right, the commit that introduced the error is:
> > ----------------------------------------------------
> > commit 41c3a431e21c31a9cf6d6dec4b47a126bac2996f (HEAD)
> > Author: Andreas Lehmkühler <[email protected]>
> > Date: Tue Dec 16 07:20:09 2025 +0000
> >
> > PDFBOX-6036: avoid overlapping object keys when importing pages from
> > another pdf
> >
> > git-svn-id:
> https://svn.apache.org/repos/asf/pdfbox/branches/3.0@1930616
> > 13f79535-47bb-0310-9956-ffa450edef68
> > ----------------------------------------------------
> >
> > I still don't like the new implementation of COSWriterObjectStream. The
> > original thread-safe implementation is simpler to read and more correct,
> > but I understand if you want the change for performance reasons. Removing
> > the synchronization seems like the wrong way to do this. And looking at
> the
> > numbers, the original implementation handles memory and time better in my
> > comparisons.
> >
> > == Test 3.0.6 ==
> > iter=0 wall_ms=376.855 cpu_ms=322.945 alloc_mb=92.806
> > heap_before_mb=216.448 heap_after_mb=141.633 heap_delta_mb=-74.814
> > gc_count_delta=1 gc_time_ms_delta=5
> > iter=1 wall_ms=308.536 cpu_ms=302.087 alloc_mb=92.774
> > heap_before_mb=141.633 heap_after_mb=233.633 heap_delta_mb=92.000
> > gc_count_delta=0 gc_time_ms_delta=0
> > iter=2 wall_ms=327.187 cpu_ms=316.977 alloc_mb=92.774
> > heap_before_mb=233.633 heap_after_mb=327.635 heap_delta_mb=94.002
> > gc_count_delta=0 gc_time_ms_delta=0
> > iter=3 wall_ms=396.564 cpu_ms=381.406 alloc_mb=92.774
> > heap_before_mb=327.635 heap_after_mb=113.293 heap_delta_mb=-214.343
> > gc_count_delta=1 gc_time_ms_delta=5
> > iter=4 wall_ms=470.828 cpu_ms=469.447 alloc_mb=92.774
> > heap_before_mb=113.293 heap_after_mb=205.293 heap_delta_mb=92.000
> > gc_count_delta=0 gc_time_ms_delta=0
> > iter=5 wall_ms=528.677 cpu_ms=523.818 alloc_mb=94.127
> > heap_before_mb=205.293 heap_after_mb=301.293 heap_delta_mb=96.000
> > gc_count_delta=0 gc_time_ms_delta=0
> > iter=6 wall_ms=543.075 cpu_ms=533.379 alloc_mb=92.886
> > heap_before_mb=301.293 heap_after_mb=393.293 heap_delta_mb=92.000
> > gc_count_delta=0 gc_time_ms_delta=0
> > iter=7 wall_ms=489.066 cpu_ms=483.486 alloc_mb=92.886
> > heap_before_mb=393.293 heap_after_mb=485.293 heap_delta_mb=92.000
> > gc_count_delta=0 gc_time_ms_delta=0
> > iter=8 wall_ms=817.173 cpu_ms=512.633 alloc_mb=92.824
> > heap_before_mb=487.291 heap_after_mb=148.949 heap_delta_mb=-338.342
> > gc_count_delta=1 gc_time_ms_delta=4
> > iter=9 wall_ms=897.804 cpu_ms=343.736 alloc_mb=92.824
> > heap_before_mb=148.949 heap_after_mb=240.949 heap_delta_mb=92.000
> > gc_count_delta=0 gc_time_ms_delta=0
> >
> > == Test 3.0.7 ==
> > iter=0 wall_ms=507.491 cpu_ms=501.767 alloc_mb=94.896
> > heap_before_mb=195.869 heap_after_mb=170.142 heap_delta_mb=-25.726
> > gc_count_delta=1 gc_time_ms_delta=6
> > iter=1 wall_ms=495.749 cpu_ms=492.284 alloc_mb=94.861
> > heap_before_mb=170.142 heap_after_mb=266.142 heap_delta_mb=96.000
> > gc_count_delta=0 gc_time_ms_delta=0
> > iter=2 wall_ms=437.024 cpu_ms=435.485 alloc_mb=94.861
> > heap_before_mb=266.142 heap_after_mb=360.140 heap_delta_mb=93.998
> > gc_count_delta=0 gc_time_ms_delta=0
> > iter=3 wall_ms=478.096 cpu_ms=465.265 alloc_mb=94.861
> > heap_before_mb=360.140 heap_after_mb=127.227 heap_delta_mb=-232.913
> > gc_count_delta=1 gc_time_ms_delta=5
> > iter=4 wall_ms=1096.645 cpu_ms=509.049 alloc_mb=94.862
> > heap_before_mb=127.227 heap_after_mb=221.229 heap_delta_mb=94.002
> > gc_count_delta=0 gc_time_ms_delta=0
> > iter=5 wall_ms=1049.944 cpu_ms=319.307 alloc_mb=94.863
> > heap_before_mb=221.229 heap_after_mb=317.229 heap_delta_mb=96.000
> > gc_count_delta=0 gc_time_ms_delta=0
> > iter=6 wall_ms=1851.772 cpu_ms=343.559 alloc_mb=94.863
> > heap_before_mb=317.229 heap_after_mb=411.227 heap_delta_mb=93.998
> > gc_count_delta=0 gc_time_ms_delta=0
> > iter=7 wall_ms=485.274 cpu_ms=345.465 alloc_mb=94.863
> > heap_before_mb=411.227 heap_after_mb=116.257 heap_delta_mb=-294.970
> > gc_count_delta=1 gc_time_ms_delta=4
> > iter=8 wall_ms=407.939 cpu_ms=405.857 alloc_mb=94.802
> > heap_before_mb=116.257 heap_after_mb=210.259 heap_delta_mb=94.002
> > gc_count_delta=0 gc_time_ms_delta=0
> > iter=9 wall_ms=689.281 cpu_ms=380.940 alloc_mb=94.801
> > heap_before_mb=210.259 heap_after_mb=304.257 heap_delta_mb=93.998
> > gc_count_delta=0 gc_time_ms_delta=0
> >
> > But maybe you've seen another trend using another profiling tool.
> >
> > The results above are created by a ChatGPU testing tool that warms the
> code
> > 5 times and then tests it 10 times while outputting the result.
> > PDFBox code I ran was loading a PDF and saving the document without any
> > changes.
> >
> > Full code here:
> >
> https://github.com/kalaspuffar/PDFBoxTestBase/blob/main/src/main/java/TestingPerformance.java
> >
> > Best regards
> > Daniel
> >
> > On Wed, Feb 4, 2026 at 8:28 AM Andreas Lehmkühler <[email protected]>
> wrote:
> >
> >> Hmmm, the first commit introduced a regression which ended up in crashes
> >> and the second one fixed the regression. The whole change was about
> >> compressed object streams which shall not contain already compressed
> >> objects such as content streams using FlateFilter as filter. Saying
> >> that, I'm hesitant to believe that your issue is related to those
> >> changes. Maybe another commit between those commits is the root cause.
> >>
> >> Without some sample code it is fishing in troubled waters.
> >>
> >>
> >> Am 03.02.26 um 18:17 schrieb Daniel Persson:
> >>> Hi Andreas
> >>>
> >>> It's in 3.0.7. I ran a bunch of commits in order to figure out when the
> >>> issue was introduced.
> >>>
> >>> 87011ade3 fail
> >>> f3bb496975ee6ca6ae98c00c0e50cfc4375a3f8a fail
> >>> 7ee6d390278fd0b06668ec65ede14810c6075ec9 crash
> >>> 26283807ad crash
> >>> dd76acd546 crash
> >>> 2fef081c714d8c6524aab118e2bfec7cf379e45a crash
> >>> 08bc6fdd5200966309787a8188c3d7d5827b170a crash
> >>> 3800af7bc5d8f08af99a653b37f8e4cd67bf1659 crash
> >>> 1d4ae695a83c33999bda78a1d9f8c43512940965 crash
> >>> 1ac4a24f8f7dfd08924ef9645246656ad3b9b33a crash
> >>> 994b87e2b4d30ac2435cff9fe20ecdfc6ab1b916 crash
> >>> f82d2224a047bc642f1d38ff18360c61eaf9cccf success
> >>> d7d34f25cec7f4884e8f599ed620b2c3c704017b success
> >>> 045d17604640a68b798027300f690f0af2b1a95d success
> >>> cdffe505e8bdeb5810456c1e6d9df61c7e2aab85 success
> >>> 304ab0027d18fc8df5638f39bac033a55769dc4e success
> >>> 222fb5f3b32fdb20f11107919700a80d1dcc130e success
> >>>
> >>> Never commits on top.
> >>>
> >>> So the two pivital commits we have is:
> >>>
> >>> --------------------------------------------------
> >>> commit 994b87e2b4d30ac2435cff9fe20ecdfc6ab1b916 (head)
> >>> Author: Andreas Lehmkühler <[email protected]>
> >>> Date: Sat Dec 6 12:32:10 2025 +0000
> >>>
> >>> PDFBOX-5169: reduce the memory footprint by reusing the internal
> >> byte
> >>> array instead of copying it
> >>>
> >>> git-svn-id:
> >> https://svn.apache.org/repos/asf/pdfbox/branches/3.0@1930285
> >>> 13f79535-47bb-0310-9956-ffa450edef68
> >>> --------------------------------------------------
> >>> After this one the created PDF could not be rendered in poppler.
> >>>
> >>> Next we have this:
> >>> --------------------------------------------------
> >>> commit f3bb496975ee6ca6ae98c00c0e50cfc4375a3f8a (HEAD)
> >>> Author: Andreas Lehmkühler <[email protected]>
> >>> Date: Sat Jan 10 11:25:01 2026 +0000
> >>>
> >>> PDFBOX-6142: take the size of the stream into account when
> accessing
> >>> the data of the underlying byte array
> >>>
> >>> git-svn-id:
> >> https://svn.apache.org/repos/asf/pdfbox/branches/3.0@1931215
> >>> 13f79535-47bb-0310-9956-ffa450edef68
> >>> --------------------------------------------------
> >>> This one stores COSDictionary instead of COSStream for the contents of
> >> the
> >>> document sometimes.
> >>>
> >>> Best regards
> >>> Daniel
> >>>
> >>>
> >>> On Tue, Feb 3, 2026 at 4:34 PM Andreas Lehmkühler <[email protected]>
> >> wrote:
> >>>
> >>>>
> >>>>
> >>>> Am 03.02.26 um 15:46 schrieb Daniel Persson:
> >>>>> Hi again.
> >>>>>
> >>>>> Sorry to say that this version is still not great.
> >>>> Thanks for the feedback
> >>>>
> >>>>>
> >>>>> -1.
> >>>>>
> >>>>> I have not figured out what is going on because we do a lot of
> >>>> operations,
> >>>>> but when I process a file with multiple pages (48) and do all our
> >>>>> operations, and then save it again. I get a bunch of blank pages.
> >>>>> So the first 38 pages don't save COSStream for the Content stream; it
> >>>> uses
> >>>>> a COSDictionary with the length and filter.
> >>>>>
> >>>>> Filter: FlateDecode
> >>>>> Length: 7820
> >>>>>
> >>>>> So the first 38 pages are blank, and the last 10 are stored
> correctly.
> >>>> This
> >>>>> is a change from the previous version of PDFBox.
> >>>>>
> >>>>> Trying to create a minimal critical example code to show this issue.
> >>>>> Sending this email if someone might have an idea why I see this.
> >>>> Is this new in 3.0.7?
> >>>>
> >>>>
> >>>>>
> >>>>> Best regards
> >>>>> Daniel
> >>>>>
> >>>>> On Mon, Feb 2, 2026 at 6:14 PM Andreas Lehmkühler <[email protected]>
> >>>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> a candidate for the PDFBox 3.0.7 release is available at:
> >>>>>>
> >>>>>> https://dist.apache.org/repos/dist/dev/pdfbox/3.0.7/
> >>>>>>
> >>>>>> The release candidate is a zip archive of the sources in:
> >>>>>>
> >>>>>> https://svn.apache.org/repos/asf/pdfbox/tags/3.0.7/
> >>>>>>
> >>>>>> The SHA-512 checksum of the archive is
> >>>>>>
> >>>>>>
> >>>>
> >>
> bf863c69225821d93d4a4cf86b4dae59c93211651ca72bfbf5da7dfcf6a480b3d7b8c0ea672adbba789afd0e79481ec8883da15e29c5fa31cba564aa8cfc89d0.
> >>>>>>
> >>>>>> Please vote on releasing this package as Apache PDFBox 3.0.7.
> >>>>>> The vote is open for the next 72 hours and passes if a majority of
> at
> >>>>>> least three +1 PDFBox PMC votes are cast.
> >>>>>>
> >>>>>> [ ] +1 Release this package as Apache PDFBox 3.0.7
> >>>>>> [ ] -1 Do not release this package because...
> >>>>>>
> >>>>>>
> >>>>>> Here is my +1
> >>>>>>
> >>>>>> Andreas
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>> For additional commands, e-mail: [email protected]
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [email protected]
> >>>> For additional commands, e-mail: [email protected]
> >>>>
> >>>>
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>