Firstly I have successfully tested my changes with a ~20,000 page document of moderate complexity, it is in fact one of the example documents but I forget which one (doh) - multiplied by some silly factor like 5000 or something. I have also successfully tested my changes against a 120-odd page version of 'docs/examples/svg/embedding.fo' - that three page file x forty-something. I would try it on a bigger file but I need to sleep instead; I'll set one of my machines going on a really big file tomorrow just to see where the limits are. I haven't looked at the heap size growth yet but I expect a very small linear growth since the PDF still has to maintain a list of references for each page. Still, that's quite tiny and probably can be optimized significantly if it becomes a problem.
I have some questions regarding the structure of FOP and how my changes might be integrated into the main tree, assuming of course that they are desired and the PDF page parent issue can be sorted out.
Firstly it could be argued that I have violated one of the secondary design principals of FOP, namely that the separation between parsing/formatting/rendering has suffered a warp core breach. In fact I think I have managed to keep the separation pretty clear, but at some point control has to move from the parser to the formatter/renderer and this is no longer done by the caller. By this I mean that the thread running SAX is now running the formatting and rendering. Does anything see this as a huge problem? (What is the plan for FOP 0.20 anyway?)
At the moment I am doing this in a horrid hack in FOTreeBuilder because that's just the most obvious place to find where the fo:page-sequence element ends. I also initialize and close the renderer in FOTreeBuilder using some new methods I had to add (PDF specific atm). I am not certain that FOTreeBuilder is the right place for this code, but it might be. Any suggestions? I like the idea of a bridging class actually; ParserRenderer or something, that FOTreeBuilder uses.
Secondly, because of the change in the rendering sequence I am afraid that other renderers might not work any more, especially not the AWT renderer. I was wondering if anyone would like to help me with these other renderers? The changes are fairly straightforward. I am happy to work on the AWT renderer and can probably work out the PCL renderer, but if my code might be accepted into the tree then it would probably be beneficial if someone else knew broadly what I've been up to. I don't have any PCL devices handy so I would have some difficulty testing any changes I make. And I'm not even sure what MIF is, some sort of interchange format .. ?
I would also like to hear from anyone who would like to test the changes for me. I am not extensively well-versed in XML:FO, PDF, XML or anything else for that matter, and it would be good to find out if I've broken anything serious or if there are any remaining cases where references are being held too long and we have memory leaks. (Are there automated tests?)
I'm also interested in some performance comparisons, we allocate a lot less memory now and GC takes much less time, so maybe someone might be interested in doing that. I can provide the source and/or .jar files for people to play with, drop me a line. (note: the source, apart from the renderer, is horrible, tomorrow I will be cleaning it up and working out the data flows properly).
Anyway sorry for my verbose discussions of these things, it seems I'm a bit of a letter writer by nature.