As you probably noticed, I'm looking again into improving the performance of the intermediate format: http://wiki.apache.org/xmlgraphics-fop/AreaTreeIntermediateXml
I'm trying to find out what the best course of action is. One route I've sketched out is to add an additional layer after the Renderer (currently named IFPainter) which is basically an interface providing all primitive painting operations needed by FOP plus infrastructure for bookmarks, metadata and custom extensions. That will most probably address the issue of performance between the intermediate file and the final target file. But it does not improve the performance between layout and rendering. At least, no performance penalty is to be expected by the change. Sounds good so far (except for the amount of work to be done and the added complexity). But then, I remembered my profiling session months ago and the main hotspot I found in AreaTreeParser which is the main reason why I'm looking for a way to improve performance. AreaTreeParser does a lot of somewhat generic trait setting based on the attributes of the consumed elements (see AreaTreeParser$Handler.startElement()). That results in lots of hashCode() calls and Map operations. With the number of elements to process even fast little methods accumulate to massive CPU consumption. So I wondered whether we should actually revisit the area tree in the first place. Maybe the Area.props Map is not the best idea. After all, we know exactly which area tree object uses which traits. So why not have concrete getters and setters for each trait on each area tree object? Of course, that would make the AreaTreeParser bigger because you can do less in a generic way. But performance could profit from that a lot, at least when the intermediate format is used. The benefit when rendering directly is probably negligible as it hasn't shown up as a hotspot, yet. IFPainter (see http://wiki.apache.org/xmlgraphics-fop/AreaTreeIntermediateXml/NewDesign): + very good performance when rendering from IF + simpler IF format (easier to write by hand) + IFPainters are much easier to implement than Renderers + preserved backwards compatibility + parallel development possible without endangering stability + output formats can be switched individually when each IFPainter is stabilized o No benefit for the non-IF use case which is the usual way to run FOP - added complexity - a lot of work - a lot of new code is added - added risk that some output format specialities cannot be mapped as well with IFPainter as with Renderer. (mainly PCL comes to mind here although there's a work-around available (text as bitmaps)) Improving the area tree: + similar performance gain might be possible + small performance gain possible for non-IF use case + chance to revisit the area tree structure and to simplify it a bit - the old IF remains difficult to handle for people who write IF by hand - the amount of source code in the area package increases - backwards compatibility for renderers is not preserved. All renderers need to be touched. - Old intermediate format will be changed which could make adjusting many test cases necessary. Regardless of the change: * Preparation for a structure tree and tagged PDF will have to be done at some point which has an impact on the area tree, the IF and the renderers. * Impact of full writing mode support is still unknown. In contrast to the above, there's an additional way to increase performance without much work: We just make use of modern multi-core CPUs. FOP is mostly single-threaded. If you look at the CPU usage in a dual-core machine, you'll see that it will stay at about 50% when rendering. If we do area tree parsing in the main thread and rendering in another we can do both at the same time. That's an easy way to decouple the two tasks. There's also no fine-grained synchronization as it could be done per page which is coarse enough not to create a performance problem. The only risk I see is memory consumption as the layout engine or the area tree parser might be faster to build pages than the renderers can render them. But if that happens we could probably add a setting that blocks the page source if the renderer is too far behind. Based on my profiling I would estimate the performance improvement to be in the area of 50-60%, even for the non-IF case, on multi-core CPUs. Single core CPUs will probably not profit but also not suffer. I'm curious if anyone has thoughts around this topic as I'm having difficulties deciding for a course of action. Jeremias Maerki
