Re: Performance of the intermediate format and the area tree

Jeremias Maerki Thu, 22 Nov 2007 08:01:50 -0800

On 22.11.2007 16:23:36 Chris Bowditch wrote:
> Jeremias Maerki wrote:
> 
> Hi Jeremias,
> 
> > As you probably noticed, I'm looking again into improving the
> > performance of the intermediate format:
> > http://wiki.apache.org/xmlgraphics-fop/AreaTreeIntermediateXml
> 
> Thanks for that. Looks good so far!
> 
> <snip/>
> 
> > 
> > IFPainter (see 
> > http://wiki.apache.org/xmlgraphics-fop/AreaTreeIntermediateXml/NewDesign):
> > + very good performance when rendering from IF
> > + simpler IF format (easier to write by hand)
> > + IFPainters are much easier to implement than Renderers
> > + preserved backwards compatibility
> > + parallel development possible without endangering stability
> > + output formats can be switched individually when each IFPainter is
> > stabilized
> > o No benefit for the non-IF use case which is the usual way to run FOP
> > - added complexity
> > - a lot of work
> > - a lot of new code is added
> > - added risk that some output format specialities cannot be mapped as
> > well with IFPainter as with Renderer. (mainly PCL comes to mind here
> > although there's a work-around available (text as bitmaps))
> 
> A simple test, count the pros and cons and add the counts together = +6 
> -4 = +2
> 
> > 
> > Improving the area tree:
> > + similar performance gain might be possible
> > + small performance gain possible for non-IF use case
> > + chance to revisit the area tree structure and to simplify it a bit
> > - the old IF remains difficult to handle for people who write IF by hand
> > - the amount of source code in the area package increases
> > - backwards compatibility for renderers is not preserved. All renderers
> > need to be touched.
> > - Old intermediate format will be changed which could make adjusting
> > many test cases necessary.
> 
> +3-4=-1
> 
> That is a simplistic way of looking at at but at least its objective 
> rather than subjective.


Uhm, with these things you can always manipulate the plus/minus items so
that the calculation results in the desired value. I've seen the bad results
of such an approach. It's VERY difficult to make this objective.

> My personal preferrence is for the first 
> approach because it will remain so difficult to write/modify IF XML in 
> approach 2.

That's why I tend to approach 1 myself. But besides the compatibility
issue it's the only reason. Still, I had to take the step back and look
at alternative approaches.

> > 
> > Regardless of the change:
> > * Preparation for a structure tree and tagged PDF will have to be done
> > at some point which has an impact on the area tree, the IF and the
> > renderers.
> > * Impact of full writing mode support is still unknown.
> > 
> > 
> > In contrast to the above, there's an additional way to increase
> > performance without much work:
> > We just make use of modern multi-core CPUs. FOP is mostly
> > single-threaded. If you look at the CPU usage in a dual-core machine,
> > you'll see that it will stay at about 50% when rendering. If we do area
> > tree parsing in the main thread and rendering in another we can do both
> > at the same time. That's an easy way to decouple the two tasks. There's
> > also no fine-grained synchronization as it could be done per page which
> > is coarse enough not to create a performance problem. The only risk I
> > see is memory consumption as the layout engine or the area tree parser
> > might be faster to build pages than the renderers can render them. But
> > if that happens we could probably add a setting that blocks the page
> > source if the renderer is too far behind. Based on my profiling I would
> > estimate the performance improvement to be in the area of 50-60%, even
> > for the non-IF case, on multi-core CPUs. Single core CPUs will probably
> > not profit but also not suffer.
> 
> -1 to this approach for several reasons:
> 
> 1) Anyone integrating FOP via the API will not be able to plug into a 
> J2EE container because spawning threads is not allowed in a EJB.

Sorry, but that doesn't count. FOP already violates the J2EE spec today:
- It spawns threads.
- It accesses local files through java.io.File (although that can be
worked around by proper usage (i.e. URIResolver & Co.)).
- Some plug-ins (most notably Batik) also spawn new threads.

If anyone wants to run FOP in a J2EE-conformant way you have to put it
in a JCA wrapper.

> 2) Keeping FOP single threaded means the choice to multi-thread is left 
> with the API Integrator. It is very easy to get FOP to use the full CPU 
> power of multi-cores. I have a test app which is only about 100 lines of 
> code which starts multiple threads. Each thread pulls documents from a pool.

Yes, that's a common approach. But it doesn't help with print file
generation from intermediate files where you cannot parallelize so
easily if you are not allowed to create multiple print files for one job.
If that were possible we wouldn't necessarily discuss this topic now. Note
that the parallelizing of the rendering could also be made optional, if
someone wants to disable it. At any rate, people running FOP in a
single-document context (documentation for example) could profit from
the speed-up. CPUs became multi-core because it becomes difficult to
make single cores much faster today. At some point, developers have to
adjust to that.

> 3) Code maintenance becomes a nightmare in a multi threaded application. 
> The synchronization might look simple at first but it quickly becomes 
> v.v.difficult. 3 of us at my company recently spent 3 months 
> troubleshooting the bugs out of a multithreaded application which is 
> somewhat smaller and less complex than FOP. This point is re-iterated by 
> Andreas' recent efforts to fix synchronization of the Cache Cleaner 
> threads in the Property Cache. It looks simple in theory but in practise 
> turns out to be more trouble than its worth.

That's a valid point. Concurrency with plain Java 1.4 is tricky indeed,
but toolsets like util.concurrent and JSR166 (Java 1.5 java.util.concurrent)
help a lot making this stuff easier.

http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html
http://backport-jsr166.sourceforge.net/

Still sticking to the -1 for that one? :-)

Thanks for your feedback!

Jeremias Maerki

Re: Performance of the intermediate format and the area tree

Reply via email to