Thanks for the reply. > Why should the list not be kept? We need it for when the file is saved.
I need to study that code a bit more, there is a lot going on there that I don't yet understand. What I was thinking was if there might be an alternative to keeping the stream object in memory, like storing the necessary metadata for it in a smaller structure. Maybe the stream is the perfect object for this. However, at 4K or more a piece, and one per page, this scales at least linearly with the number of pages. When dealing with "normal" documents, this is not an issue. But when the number of pages gets large, this overhead is significant. We had someone try to create a PDF from a 25,000 page text source. 25,000 * 4K is 100 megabytes. If it was possible to not maintain any data in the ScratchFileBuffer, it would scale a bit better. Thanks again, Mark Claassen Senior Software Engineer Donnell Systems, Inc. 130 South Main Street Leighton Plaza Suite 375 South Bend, IN 46601 E-mail: mailto:[email protected] Voice: (574)232-3784 Fax: (574)232-4014 Disclaimer: The opinions provided herein do not necessarily state or reflect those of Donnell Systems, Inc.(DSI). DSI makes no warranty for and assumes no legal liability or responsibility for the posting. -----Original Message----- From: Tilman Hausherr <[email protected]> Sent: Thursday, June 10, 2021 12:02 PM To: [email protected] Subject: [Possible Spam] Re: PDF Memory issue Importance: Low Why should the list not be kept? We need it for when the file is saved. Tilman Am 10.06.2021 um 03:07 schrieb Mark A. Claassen: > (This was started on the users list, but I am switching over to the > dev list.) > > I found the issue. I have a bunch of small pages. The COSDocument keeps a > list of the streams that have been created. The problem is that the > currentPage in the ScratchFileBuffer is always in memory. If there are > 40,000 pages, then this will add up to 40,000 * the page size (4096) which is > over 160,000,000. > > So, now I am not sure how to deal with this. Each page has a > PDFPageContentStream, which creates a ScratchFileBuffer. > This ScratchFileBuffer is kept in the list of streams. I could recompile > with a smaller page size, but that will only cut the problem by a percentage. > Does anyone think it may be possible to change this to not maintain the list > of streams? Or maybe clear the currentPage byte array for the items in the > list? > > I am willing to do some work on this, but a little guidance (or realism) > would be helpful before I get too deep into this. > > Thanks, > > Mark Claassen > Senior Software Engineer > > Donnell Systems, Inc. > 130 South Main Street > Leighton Plaza Suite 375 > South Bend, IN 46601 > E-mail: mailto:[email protected] > Voice: (574)232-3784 > Fax: (574)232-4014 > > Disclaimer: > The opinions provided herein do not necessarily state or reflect those > of Donnell Systems, Inc.(DSI). DSI makes no warranty for and assumes > no legal liability or responsibility for the posting. > -----Original Message----- > From: Mark A. Claassen <[email protected]> > Sent: Wednesday, June 9, 2021 4:53 PM > To: [email protected] > Subject: [Possible Spam] RE: PDF Memory issue > Importance: Low > > In looking at this further, it seems that the ScratchFileBuffer.close method > is only called when the document is closed. ScratchFileBuffer.clear is never > called. > > These are the only places where the pageHandler.markPagesAsFree is called. I > believe this is the issue, since markPagesAsFree is never called, this > content just keeps building up until the document is closed. > > Any guidance would be greatly appreciated. I can't seem to find a > configuration work around for this issue. > > Mark Claassen > Senior Software Engineer > > Donnell Systems, Inc. > 130 South Main Street > Leighton Plaza Suite 375 > South Bend, IN 46601 > E-mail: mailto:[email protected] > Voice: (574)232-3784 > Fax: (574)232-4014 > > Disclaimer: > The opinions provided herein do not necessarily state or reflect those of > Donnell Systems, Inc.(DSI). DSI makes no warranty for and assumes no legal > liability or responsibility for the posting. > > > -----Original Message----- > From: Mark A. Claassen <[email protected]> > Sent: Wednesday, June 9, 2021 1:39 PM > To: [email protected] > Subject: [Possible Spam] PDF Memory issue > Importance: Low > > Hi. Thanks for your time. > > I am using PDF box and am having trouble creating large PDFS (50,000+ pages). > The heap size of the process is capped, but with the temp file active (which > I can see being created) I didn't think this would matter. > > Here is what I am doing in a very condensed form: > MEMORY_SETTING = MemoryUsageSetting.setupTempFileOnly(); > PDDocument pdf = new PDDocument(MEMORY_SETTING); > > for (...) { > String text = [generate page text] > PDPage page = new PDPage(PDRectangle.LETTER); > try (PDPageContentStream contentStream = new > PDPageContentStream(doc, page, > PDPageContentStream.AppendMode.OVERWRITE, false)) { > > contentStream.endText(); > doc.addPage(page); > } > > When I do a heap dump, I see over 100 MG of memory taken by 42,000 > instances of ScratchFileBuffer.currentPage > > Is there something I am going wrong here? Or is this a bug? It seems like I > must be doing something wrong / forgetting to do something, since this is a > problem in 2 and 3-RC1. > > Thanks again, > > Mark Claassen > Senior Software Engineer > > Donnell Systems, Inc. > 130 South Main Street > Leighton Plaza Suite 375 > South Bend, IN 46601 > E-mail: mailto:[email protected] > Voice: (574)232-3784 > Fax: (574)232-4014 > > Disclaimer: > The opinions provided herein do not necessarily state or reflect those of > Donnell Systems, Inc.(DSI). DSI makes no warranty for and assumes no legal > liability or responsibility for the posting. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
