Exploring the FOP API design space

J.Pietschmann Fri, 31 May 2002 14:15:14 -0700

Hi foppers,
I know I should provide code instead of  talking, but then...

The current FOP API suffers from a variety of deficiencies
- unexpected statefulness (most horribly embodied in
   XSLTInputHandler)
- weak abstraction of input and output channels
- incomplete separation of abstraction levels.
- cruft :-)


Some points I think should be followed on design of a
new and hopefully better API:
- Atomic initialisation. After creating a processor,
  it should be ready to run. Mandatory parametrisation
  data should be passed either to the constructor or the
  method(s) running the formatting process, everything
  else should be initialised from sensible defaults.
- No file names, anywhere. Strings representing ressources
  are always URLs, on the command line, in the config file
  everywhere. In the API, use java.io.File if files are
  deemed necessary.
- No baseDir. Define a baseURL concept. Pass all URL
  through a resolver.
- Better abstraction of input and output channels.

Whether only an avalon component API is exposed or whether there is an
avalon-free API and a separate avalon component is a matter of
taste. In either case, I'd like to have the possiblity to run a FOP
core without access to external config *files*, this means I can
create a new Driver() and can pass all config data by java properties,
service definitions and by using a user written Configuration class
passed to the Driver.configure() method for everything too complex to
be passed as properties and services (i.e. user font config). A FOP
default Configuration class could read a system and a user config
file. From what I've gathered from Avalon this is already implemented
this way there. However, I'm not sure, and I'm not dogmatic about
this.

The problem I have is the design space for abstracting input
and output channels.

  = Input =
For input, we have the javax.xml.transform.Source stuff which
provides a nice unified encapsulation of SAX, DOM and serialised
XML streams as well as SAX and DOM itself.

The nice part about the j.x.t.Source stuff is that it shields
the user from as much of the lower level XML stuff as possible,
in particular from setting up a parser in the common case of
having serialised XML as input.

Design choice 1:
Use j.x.t.Source as FOP input. Implement a
o.a.f.stream.XSLTStreamSource as a j.x.t.s.StreamSource subclass
for providing XSLT power. (see end of message for an interface
proposal)

Choice 2:
Provide SAX and DOM as input (getContentHandler() and render(DOM))

Choice 3:
Provide (more precise: expose) both. Redundant, but, well...

  = Output =
Next problem: output. We have two rather radically different
output types: byte streams and GUI panels.
A really stumbling block is that the object the output is
written to  is volatile, it is likely to change with every
rendering run, while the kind of renderer as well as the
renderer specific configuration is more stable. This has
profund implications for the API design.

Choice 1:
The interface is at the final output level. This means
render()/run() methods for each of the classes:
   render(OutputStream) // for PDF, MIF, PS, ...
   run(UserAgent) // for AWT...
We could add a print() method if necessary.
Rationale for choosing the method names: render() means
the input FO is rendered to a byte stream. Run() means,
the UserAgent is started and the user can interact with it.
The run() method will return if the user somehow ends the
interaction process and shuts down the UserAgent. Do I
interpret the current state correctly?
This choice implies the renderer and any configuration
data specific to the renderer has to be passed to the
Driver (processor) through the Driver configuration
methods. Because some renderers can be assumed to have
a lot of renderer specific config data which warrants
a structure imposed on it, I'm not very fond of the whole
idea.

Choice 2:
The interface is the renderer. This means the renderer
object has to be created by the user explicitely. The
advantage is that the renderer configuration can be
designed to fit the renderer rather than to be passed
through a more generic interface at the Driver. Also,
renderer configuration and the renderer independent
processor configuration are better separated, which
might be a good idea, in particular for people who want
to render the same FO to several different output formats.
In this case, a typical code snipped would look like

   Processor p=new Processor(
    new ProcessorConfiguration(new File("myconfig.xml")));
   Renderer r=new PDFRenderer(
    new PDFRendererConfiguration("cocoon:/myPDFconfig.xml")));
   p.render(new StreamSource(new File("foo.fo")),r);
(I don't mind if the configuration is not passed to the
constructor but to a configuration() method, this is just
for illustration).

  = Reuse =
Last problem: reuse processors and renderers.
The XSLT processor of the JAXP interface and presumably
many XML parsers are throw away objects and not meant
to be reused after the "work" method (transform(), parse())
has been called.

Choice 1:
Make both processor and renderers throwaway objects. No
reset() method. Advantage: the state after the rendering
has ended can be retrieved as long as the objects are kepts.
The most common use case for this which has been mentionedp
on this list is inquiring the total number of pages rendered.
There are other use cases for sure.
I'm not sure how well this would fit into the avalon
component model. Can someone enlighten me?
Another consequence would be factory objects, where a user
can conveniently prepare a preconfigured template so that
repeated processor creation is simple and fast. Again, I'm
not sure if this fits well in the model with separated
processor and renderer, it is likely that the user will
create lots of identically configured processor+renderer
combinations.

Choice 2:
Make processor and/or renderers reusable by providing a
reset() method. Again, in the model wit separated processor
and renderers users may be confused by having to reset two
objects. Another interesting question would be whether the
renderer is kept after resetting the processor or not.
In the first case, the renderer is a part of the processor
configuration rather than a rendering parameter and should
be passed to the constructor rather than to the rendering
method.

Choice 3:
Reusable processor with auto-reset. The disadvantage is that
no state is kept after rendering has ended. THere is still
the possibly confusing problem whether a new renderer has
to be used or the old renderer is kept.

  = Caching =
Caching is an interesting topic. It comes in two flavours:
1. Caching of stuff like images within a rendering run.
2. Caching across multiple rendering runs on reused objects
The first is not only concerned with efficiency but also with
predictability. Consider
  <fo:page-sequence initial-page-number="1">
     <fo:static-content>
       <fo:external-graphic src="http://dynamic.com/curr-time.gif"/>
    ...
  <fo:page-sequence initial-page-number="20">
     <fo:static-content>
       <fo:external-graphic src="http://dynamic.com/curr-time.gif"/>
Will the two page sequences feature the same or different
pictures in the page header?
XSLT explicitely says that within a transformer run, multiple
access to the same URL results in the same content.
The other interesting question is whether object reuse implies
caching stuff like images across rendering runs. Whether this is
useful depends on how often and how much stuff is shared. The
use cases vary from rendering the same document several times
to rendering documents sharing the same logo in the header to
rendering documents at random.

Choice 1:
No caching at all, or a non-guaranteed caching. Risk reading
sources multiple times, including possibly dynamically changing
content.
Perhaps we should leave the cache problem to another application
layer. Cocoon appears to be quite good at it, no reinvention of the
wheel necessary.

Choice 2:
Guarantee an URL is only read once within a rendering run. May imply
memory problems.

Choice 3:
Expose caching across multiple renderings on a reused object.
Needs an API for Cache control.
(My opinion: not recommended).

  = Conclusions =
Ok, concrete proposals for the new interface, tentatively
called Processor, for various combinations of the design
changes regarding output abstraction and reuse. (I use
j.x.t.Source for input, this does not mean I'm biased to
this. Ok, I am :-) )

1. Output is physical. Throw away.
  class Processor {
     // default renderer, may adapt to output type
    Processor()
    // configureation includes renderer choice
    Processor(Configuration)
    run( Source s, UserAgent ua)
    render( Source s, OutputStream o)
  }

2. Alternative with separate configuration method
  class Processor {
    Processor()
    configure(Configuration)
    run( Source s, UserAgent ua)
    render( Source s, OutputStream o)
  }

3. Output is physical. Alternative for avoiding calling and
  explicitly configured renderer with an improper output type
  class Processor {
    Processor(Source s, UserAgent ua)
    Processor(Source s, OutputStream o)
    Processor(Source s, UserAgent ua, Configuration)
    Processor(Source s, OutputStream o, Configuration)
    run()
  }

4. Alternative with separate configuration method
  class Processor {
    Processor(Source s, UserAgent ua)
    Processor(Source s, OutputStream o)
    configure(Configuration)
    render() // or run()
  }

5-8. Add a reset() which resets both processor and renderer
  to either of the altenatives above.

9. Output is Renderer. Throw away. Not well suited for Factory.
  class Processor {
    Processor()
    Processor(Configuration)
    render( Source s, Renderer r)
  }
  class PDFRenderer {
   PDFRenderer(OutputStream o)
   PDFRenderer(OutputStream o, Configuration)
  }

10. Add reset() to 9.
11. Variants for Renderer output and Factory approach
  omitted (look ugly). Add your own proposals

  = Further activity =
Well, I suppose there will be a consensus built:
- Whether to expose
  1. Avalon component interface only
  2. Both Avalon and non-avalon interface
  3. Non-Avalon interface only
- Design variant for input channel
- Design variant for output channel
- Design variant for object reuse
- Whether to provide a factory (if appropriate)
I hope this happens within the next week.
I will then post a detailed interface to the list. I hope
someone will help me to avalonise this, if necessary.
After the interface is voted on, I'll implement this,
with the objective to have running code in august. The
current interface should be deprecated but kept for a
few maintenance releases.

Is this ok?

J.Pietschmann


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Exploring the FOP API design space

Reply via email to