[ https://issues.apache.org/jira/browse/PDFBOX-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711074#comment-16711074 ]
Tilman Hausherr commented on PDFBOX-4396: ----------------------------------------- The resource cache is not to be shared across documents. The key is COSObject, i.e. an indirect object number. In a PDF file you see these as "10 0 R", and the objects as "10 0 obj". I don't know what internal comment you mean (you didn't quote it), but there is a weakness somewhere that scratch file buffers are not closed properly and this is done in finalization. There is a JIRA issue on this, e.g. PDFBOX-3388 and PDFBOX-3359. If your problem gets solved by calling gc yourself then it means java is to blame because it should do a gc by itself when memory is too low to allocate new objects. If you can reproduce a scenario that eats up available memory then please share the PDF and the code. > Memory leak due to soft reference caching > ----------------------------------------- > > Key: PDFBOX-4396 > URL: https://issues.apache.org/jira/browse/PDFBOX-4396 > Project: PDFBox > Issue Type: Bug > Affects Versions: 2.0.12 > Environment: JDK10; G1 > Reporter: Ben Manes > Priority: Major > Attachments: #2 - memory leak 2.png, #2 - memory leak.png, memory > leak 2.png, memory leak.png > > > In a heap dump, it appears that DefaultResourceCache is retaining 5.3 GB of > memory due to buffered images (via PDImageXObject). I suspect that G1 is not > collecting soft references across all regions before it out-of-memory errors. > In PDFBOX-4389, I discovered very slow PDDocument#load times due to a JDK10 > I/O bug. Previously I was loading the document to render each page, but this > took 1.5 minutes. To work around that bug I reused the document instance > across pages. This seems to have fail because the pages were cached and not > cleared by the GC. > The DefaultResourceCache does not prune its cache entries when the soft > references are collected. Like WeakHashMap, it should use a ReferenceQueue, > poll it on every access, and prune accordingly. > Thankfully PDDocument#setResourceCache exists. For now I am going to reset > the cache to a new instance after a page has been rendered. The entries > should no longer be reachable and be GC'd more aggressively. If that doesn't > work, I'll either replace the cache (e.g. with Caffeine) or disable it by > setting the instance to null. > I think the desired fix is to prune the DefaultResourceCache and, ideally, > reconsider usage of soft references (as they tend to be poor in practice). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org