Hey there fopsicles
Well I just generated a 17,344 page document using FOP on Linux JDK1.3.1 with a 64Mb heap into PDF. I also tried processing a 34,688 page document but ran out of memory on page 33,287 (bummer!). I suspect that the OOM error is almost certainly due to the PDFRenderer keeping it's output in RAM, but that's OK by me for the moment. It looks easy enough to write the PDF on the fly though.
There were surprisingly few changes I made in order to get FOP to pipeline from when the </fo:page-sequence> is received. Of course, the code is a total hack, but I was just trying to see if I understood things properly and trying to prove the concept.
I intend to try this out on a more complex document tomorrow.
There does appear to be a requirement in Root.java where it looks to the successive page-sequence for some data to do with page numbering. The simple solution to this (IMHO) is to defer rendering of a page sequence until it's successor is also formatted. I believe this would be simple to implement.
With regards to the IDReferences: I still don't know exactly what they are for, because I haven't even tried to look in the right places, but if I'm right, a given page-sequence might refer to objects in other page-sequences using an xml ID, or something. So I figure that the way to deal with this is to keep all unresolved references in a list in the PageSequence object, and defer rendering that page-sequence (and any subsequent page-sequences) until the reference list is resolved. Once again I think this is a straightforward change. It is not a perfect solution because e.g. a table of contents presumably uses this IDReferences table and that's normally going to be at the start of a document, so under this scheme we're back to square one. An alternative solution would be to force drivers to be able to write pages out-of-sequence, so for example only the contents page would be deferred until it's references are resolved. This gets the memory-consuming stuff out of FOP but means the drivers are harder to write (OTOH since most pages are just streams it would be trivial to write a helper class to deal with out-of-order pages and reassembly). That to me is a large change and I am not suggesting this course of action at this particular time, but maybe it's something to think about.
Anyway I intend to follow up on this work tomorrow, I would like to look at the ID references thing and stop talking out my bottom about it, and I would like to look at rendering much more complex documents to see if I've made too many assumptions.
Also I have only modified the PDF driver, I haven't even looked at the other ones yet. The changes to the PDF driver are very minor though.
I hope noone is offended by my work/writing on this stuff, I realise that FOP is experimental but the number of changes are surprisingly small and the results are just so cool. Memory use is significantly reduced for all cases where there is more than one page-sequence, and total time to render seems to be significantly reduced. If anyone is interested in a summary of the changes I made then drop me a line.