Hi Clement On 16.02.2011 04:38:45 Clement Jebakumar (RBEI/EMT2) wrote: > Hello, > > I have seen many place people discussed about memory issues with FOP handling > very large files. > After setting conserver memory flag to true, also enabled File based > streaming object in stream factory, the issue seems to be exist. I know > because of limitation in FOP it cannot be handled. > I was trying to convert a file of 400MB XML file to PDF, the result PDF will > have nearly more than 10,000 pages. > Keeping many page-sequence also didn't help, because a table is spanning > nearly for 100 Pages :-(
I'm still wondering what the purpose of a table is that spans 100+ pages. - Noone's ever going to read it through. - It's a waste of paper and therefore resources (think greener). - The raw data is best offered in CSV or XML format so you can actually do something useful with it. Idea: print it on the paper as a series of linked PDF417 or DataMatrix barcodes if electronic transmission is not possible. Yeah, I know accountants like that kind of stuff but it is just stupid. Had to be said. Sorry. And yeah, FOP should still be able to handle these large tables at some point. Sigh. > I have looked at the flow. Still some where I am getting lost. So is > there a way to inject persistency in the Layouts(FO Tree) and Document > Handler? Because I am willing to do it. Well, the "conserve memory policy" is basically as much as we can easily do at the moment. It stores pages with unresolved forward references temporarily to disk. But that only reduces the memory usage a bit. The big problem right now is the use of the total fit algorithm for page-breaking which likes to look at a number of pages at the same time to optimize page breaking. Good for line breaking but bad for page breaking. I constantly regret not having noticed the consequences of that choice back in 2005. Furthermore, layout currently starts only after a full page-sequence is parsed into the FO tree. So the whole 100 pages worth of data is already in memory before the layout even starts. My take (personal opinion): - FOP would need a first-fit or best-fit algorithm for page breaking. (Today, I don't believe that total-fit is really beneficial for page breaking.) - The page breaker needs to be able to operate while the FO tree is still being built. - FO tree objects that have been fully processed need to be released while layout is still running. Those familiar with the layout engine will know that this is much much more than a weekend project. That's the hard reality of it. If there is a softer step towards the goal, I can't see it. > Please give any suggestions or ideas. > > Mit freundlichen Grüßen / Best regards, > Clement Jebakumar.C (RBEI/EMT2) > Jeremias Maerki
