Hi Craig, Before you start working on that, look at
https://issues.apache.org/bugzilla/show_bug.cgi?id=47314 Ben Wuest did some stuff to start rendering without a finished page sequence. Regards, Georg Datterl ------ Kontakt ------ Georg Datterl Geneon media solutions gmbh Gutenstetter Straße 8a 90449 Nürnberg HRB Nürnberg: 17193 Geschäftsführer: Yong-Harry Steiert Tel.: 0911/36 78 88 - 26 Fax: 0911/36 78 88 - 20 www.geneon.de Weitere Mitglieder der Willmy MediaGroup: IRS Integrated Realization Services GmbH: www.irs-nbg.de Willmy PrintMedia GmbH: www.willmy.de Willmy Consult & Content GmbH: www.willmycc.de -----Ursprüngliche Nachricht----- Von: Craig Ringer [mailto:[email protected]] Gesendet: Freitag, 10. September 2010 10:14 An: [email protected] Cc: Georg Datterl Betreff: Re: AW: Memory Leak issue -- FOP On 09/10/2010 03:44 PM, Georg Datterl wrote: > Hi Hamed, > > I did some pretty large publications with lots of images. 1500 pages > took 2GB memory, after I put some effort in memory optimization. The > only FOP-related issue I found was image caching and that can be > disabled. I'm quite sure I would have found a memory leak in FOP, > especially one related to ordinary LayoutManagers. So either make your > page-sequences shorter or give fop more memory. I can't help but wonder if FOP needs to keep the whole page sequence in memory, at least for PDF output. Admittedly I haven't verified that it *is* keeping everything in RAM, but that's certainly a whole lot of RAM for a moderate-sized document. I've been meaning to look at how fop is doing its PDF generation for a while, but I've been head-down trying to finish a web-based UI for work first. I do plan to look at it though as I've done a fair bit of work on PDF generation libraries and I'm curious about how Fop is doing it (and how much wheel-reinvention might be going on). Anyway, PDF is *designed* for streaming output, so huge PDFs can be produced using only very small amounts of memory with a bit of thought into how the output works. I've had no issues generating multi-hundred-megabyte PDF documents with very small amounts of RAM using PoDoFo, a C++ PDF library that supports direct-to-disk PDF generation. There are all sorts of tricks you can do. The most important is of course that you can make back- or forward- indirect references to almost any object, with no constraints on object order in the document. You can write whatever you generate out very aggressively. You can even split your content stream(s) for each page into multiple segments so you can write the content stream out when it gets too big. Or write the content stream to a tempfile, then merge it into the PDF after the other resources for the page have been written. There should be no need for image caching, because once you've written the image object to the PDF once, you can just reference it again in later pages. Not only does that save RAM but it makes your PDF smaller and faster. It works even if your image is used in different sizes, scales, etc in different parts of the document, because you can crop and scale using content-stream instructions. You don't even have to keep the page dictionaries in RAM. You can write them out when the page is done (or before). Because forward-indirect references are permitted, if you have content on the page that's yet to be generated you can reserve some object IDs for those content streams and output indirect references to the as-yet nonexistent content streams in the page dictionary. About the only time I can think of when you have to keep something in memory (or at least, in a tempfile) is when you have content in a page (like total page counts) that cannot be generated until later in the document - and may re-flow the rest of the page's content. If the late-generated content won't force a reflow it can just be put in a separate content stream with a forward-reference. Admittedly, I'm speaking only about the actual PDF generation. It may well be that generating the AT/IF is inherently demanding of resident RAM, or that the IF/AT don't contain enough information to generate pages progressively. The point, though, is that PDF output shouldn't use much RAM if the PDF output code is using PDF features to make it efficient. Sometimes it's a trade-off between how efficient the produced PDF is and how efficient its creation is, but you can always post-process (optimize) a PDF once it's generated if you want to do things like linearize it for fast web loading. -- Craig Ringer --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
