Ah. Too bad. Note that, if the byte arrays are immutable (or at least treated as such) and in a wrapper object (such as ByteBuffer) with, as I indicated, a proper .equals() and .hashcode() implementation, object pooling still can be effective.
Good luck! Mel Dr. Mel Martinez [email protected] > On Jun 10, 2021, at 4:57 PM, Mark A. Claassen <[email protected]> wrote: > > Thanks for the tips. I don't think they will help here, however. The 4K > object that is being held is a byte array. > > Thanks again, > > Mark Claassen > Senior Software Engineer > > Donnell Systems, Inc. > 130 South Main Street > Leighton Plaza Suite 375 > South Bend, IN 46601 > E-mail: mailto:[email protected] > Voice: (574)232-3784 > Fax: (574)232-4014 > > ------------------------------------------- > Confidentiality Notice: OCIESERVICE > ------------------------------------------- > The contents of this e-mail message and any attachments are intended solely > for the addressee(s) named in this message. This communication is intended to > be and to remain confidential. If you are not the intended recipient of this > message, or if this message has been addressed to you in error, please > immediately alert the sender by reply e-mail and then delete this message and > its attachments. Do not deliver, distribute, copy, disclose the contents or > take any action in reliance upon the information contained in the > communication or any attachments. > > > -----Original Message----- > From: Martinez, Mel - 0441 - MITLL <[email protected]> > Sent: Thursday, June 10, 2021 4:28 PM > To: [email protected] > Cc: Martinez, Mel - 0441 - MITLL <[email protected]> > Subject: Re: [Possible Spam] Re: PDF Memory issue > Importance: Low > > I haven’t looked at this particular code at all, but I’m guessing that a LOT > of the objects being referenced are strings — possibly identical strings? > > It may be useful to implement object (string) pooling. That can save a ton > of memory. > > Do not use the built-in String.intern() function for this, though. That is > limited and slow. It’s better to build the string pool around something > like ConcurrentHashMap.putIfAbsent(). > > You then need to rewrite the code to do a pool check whenever new strings are > created / input. > > > ConcurrentHashMap stringPool = new ConcurrentHashMap(); //<— do this once > and make it available to all your code, whether as a singleton or static. > > String s = someStepThatCreatesOrInputsAString(); > > s = stringPool.putIfAbsent(s, s); //<— add this step everywhere > > > This imposes a very tiny lookup cost with every putIfAbsent() call but it’s > pretty small and benchmarks you can find on the ’net show its still way > faster than String.intern(), especially for large O(n). The putIfAbsent() > call is atomic and this is perfectly thread safe. > > The end result is that you can enforce that you will have only one copy of > any string in memory, regardless of how many references you might have of it. > > For non-String objects, this can also be used but it’s important that the > object have proper .equals() and .hashcode() methods implemented. > > I hope this suggestion is helpful. This pattern saved me massive amounts of > memory in clients pulling data from cloud databases. > > If it doesn’t make sense to apply in this particular code, then hopefully it > will still prove a useful tip for someone somewhere else. > > Cheers, > > Mel > > Dr. Mel Martinez > [email protected] > > > >> On Jun 10, 2021, at 1:07 PM, Mark A. Claassen <[email protected]> wrote: >> >> Thanks for the reply. >> >>> Why should the list not be kept? We need it for when the file is saved. >> >> I need to study that code a bit more, there is a lot going on there that I >> don't yet understand. What I was thinking was if there might be an >> alternative to keeping the stream object in memory, like storing the >> necessary metadata for it in a smaller structure. >> >> Maybe the stream is the perfect object for this. However, at 4K or more a >> piece, and one per page, this scales at least linearly with the number of >> pages. When dealing with "normal" documents, this is not an issue. But >> when the number of pages gets large, this overhead is significant. >> >> We had someone try to create a PDF from a 25,000 page text source. 25,000 * >> 4K is 100 megabytes. If it was possible to not maintain any data in the >> ScratchFileBuffer, it would scale a bit better. >> >> Thanks again, >> >> Mark Claassen >> Senior Software Engineer >> >> Donnell Systems, Inc. >> 130 South Main Street >> Leighton Plaza Suite 375 >> South Bend, IN 46601 >> E-mail: mailto:[email protected] >> Voice: (574)232-3784 >> Fax: (574)232-4014 >> >> Disclaimer: >> The opinions provided herein do not necessarily state or reflect >> those of Donnell Systems, Inc.(DSI). DSI makes no warranty for and >> assumes no legal liability or responsibility for the posting. >> >> >> -----Original Message----- >> From: Tilman Hausherr <[email protected]> >> Sent: Thursday, June 10, 2021 12:02 PM >> To: [email protected] >> Subject: [Possible Spam] Re: PDF Memory issue >> Importance: Low >> >> Why should the list not be kept? We need it for when the file is saved. >> >> Tilman >> >> Am 10.06.2021 um 03:07 schrieb Mark A. Claassen: >>> (This was started on the users list, but I am switching over to the >>> dev list.) >>> >>> I found the issue. I have a bunch of small pages. The COSDocument keeps a >>> list of the streams that have been created. The problem is that the >>> currentPage in the ScratchFileBuffer is always in memory. If there are >>> 40,000 pages, then this will add up to 40,000 * the page size (4096) which >>> is over 160,000,000. >>> >>> So, now I am not sure how to deal with this. Each page has a >>> PDFPageContentStream, which creates a ScratchFileBuffer. >>> This ScratchFileBuffer is kept in the list of streams. I could recompile >>> with a smaller page size, but that will only cut the problem by a >>> percentage. Does anyone think it may be possible to change this to not >>> maintain the list of streams? Or maybe clear the currentPage byte array >>> for the items in the list? >>> >>> I am willing to do some work on this, but a little guidance (or realism) >>> would be helpful before I get too deep into this. >>> >>> Thanks, >>> >>> Mark Claassen >>> Senior Software Engineer >>> >>> Donnell Systems, Inc. >>> 130 South Main Street >>> Leighton Plaza Suite 375 >>> South Bend, IN 46601 >>> E-mail: mailto:[email protected] >>> Voice: (574)232-3784 >>> Fax: (574)232-4014 >>> >>> Disclaimer: >>> The opinions provided herein do not necessarily state or reflect those >>> of Donnell Systems, Inc.(DSI). DSI makes no warranty for and assumes >>> no legal liability or responsibility for the posting. >>> -----Original Message----- >>> From: Mark A. Claassen <[email protected]> >>> Sent: Wednesday, June 9, 2021 4:53 PM >>> To: [email protected] >>> Subject: [Possible Spam] RE: PDF Memory issue >>> Importance: Low >>> >>> In looking at this further, it seems that the ScratchFileBuffer.close >>> method is only called when the document is closed. ScratchFileBuffer.clear >>> is never called. >>> >>> These are the only places where the pageHandler.markPagesAsFree is called. >>> I believe this is the issue, since markPagesAsFree is never called, this >>> content just keeps building up until the document is closed. >>> >>> Any guidance would be greatly appreciated. I can't seem to find a >>> configuration work around for this issue. >>> >>> Mark Claassen >>> Senior Software Engineer >>> >>> Donnell Systems, Inc. >>> 130 South Main Street >>> Leighton Plaza Suite 375 >>> South Bend, IN 46601 >>> E-mail: mailto:[email protected] >>> Voice: (574)232-3784 >>> Fax: (574)232-4014 >>> >>> Disclaimer: >>> The opinions provided herein do not necessarily state or reflect those of >>> Donnell Systems, Inc.(DSI). DSI makes no warranty for and assumes no legal >>> liability or responsibility for the posting. >>> >>> >>> -----Original Message----- >>> From: Mark A. Claassen <[email protected]> >>> Sent: Wednesday, June 9, 2021 1:39 PM >>> To: [email protected] >>> Subject: [Possible Spam] PDF Memory issue >>> Importance: Low >>> >>> Hi. Thanks for your time. >>> >>> I am using PDF box and am having trouble creating large PDFS (50,000+ >>> pages). The heap size of the process is capped, but with the temp file >>> active (which I can see being created) I didn't think this would matter. >>> >>> Here is what I am doing in a very condensed form: >>> MEMORY_SETTING = MemoryUsageSetting.setupTempFileOnly(); >>> PDDocument pdf = new PDDocument(MEMORY_SETTING); >>> >>> for (...) { >>> String text = [generate page text] >>> PDPage page = new PDPage(PDRectangle.LETTER); >>> try (PDPageContentStream contentStream = new >>> PDPageContentStream(doc, page, >>> PDPageContentStream.AppendMode.OVERWRITE, false)) { >>> >>> contentStream.endText(); >>> doc.addPage(page); >>> } >>> >>> When I do a heap dump, I see over 100 MG of memory taken by 42,000 >>> instances of ScratchFileBuffer.currentPage >>> >>> Is there something I am going wrong here? Or is this a bug? It seems like >>> I must be doing something wrong / forgetting to do something, since this is >>> a problem in 2 and 3-RC1. >>> >>> Thanks again, >>> >>> Mark Claassen >>> Senior Software Engineer >>> >>> Donnell Systems, Inc. >>> 130 South Main Street >>> Leighton Plaza Suite 375 >>> South Bend, IN 46601 >>> E-mail: mailto:[email protected] >>> Voice: (574)232-3784 >>> Fax: (574)232-4014 >>> >>> Disclaimer: >>> The opinions provided herein do not necessarily state or reflect those of >>> Donnell Systems, Inc.(DSI). DSI makes no warranty for and assumes no legal >>> liability or responsibility for the posting. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] For additional >> commands, e-mail: [email protected] >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >
smime.p7s
Description: S/MIME cryptographic signature
