cruft ?

David B. Bitton

Code Made Fresh DailyT
----- Original Message ----- 
From: "J.Pietschmann" <[EMAIL PROTECTED]>
To: "fop dev" <[EMAIL PROTECTED]>
Sent: Friday, May 31, 2002 5:34 PM
Subject: Exploring the FOP API design space

> Hi foppers,
> I know I should provide code instead of  talking, but then...
> The current FOP API suffers from a variety of deficiencies
> - unexpected statefulness (most horribly embodied in
>    XSLTInputHandler)
> - weak abstraction of input and output channels
> - incomplete separation of abstraction levels.
> - cruft :-)
> Some points I think should be followed on design of a
> new and hopefully better API:
> - Atomic initialisation. After creating a processor,
>   it should be ready to run. Mandatory parametrisation
>   data should be passed either to the constructor or the
>   method(s) running the formatting process, everything
>   else should be initialised from sensible defaults.
> - No file names, anywhere. Strings representing ressources
>   are always URLs, on the command line, in the config file
>   everywhere. In the API, use if files are
>   deemed necessary.
> - No baseDir. Define a baseURL concept. Pass all URL
>   through a resolver.
> - Better abstraction of input and output channels.
> Whether only an avalon component API is exposed or whether there is an
> avalon-free API and a separate avalon component is a matter of
> taste. In either case, I'd like to have the possiblity to run a FOP
> core without access to external config *files*, this means I can
> create a new Driver() and can pass all config data by java properties,
> service definitions and by using a user written Configuration class
> passed to the Driver.configure() method for everything too complex to
> be passed as properties and services (i.e. user font config). A FOP
> default Configuration class could read a system and a user config
> file. From what I've gathered from Avalon this is already implemented
> this way there. However, I'm not sure, and I'm not dogmatic about
> this.
> The problem I have is the design space for abstracting input
> and output channels.
>   = Input =
> For input, we have the javax.xml.transform.Source stuff which
> provides a nice unified encapsulation of SAX, DOM and serialised
> XML streams as well as SAX and DOM itself.
> The nice part about the j.x.t.Source stuff is that it shields
> the user from as much of the lower level XML stuff as possible,
> in particular from setting up a parser in the common case of
> having serialised XML as input.
> Design choice 1:
> Use j.x.t.Source as FOP input. Implement a
> as a j.x.t.s.StreamSource subclass
> for providing XSLT power. (see end of message for an interface
> proposal)
> Choice 2:
> Provide SAX and DOM as input (getContentHandler() and render(DOM))
> Choice 3:
> Provide (more precise: expose) both. Redundant, but, well...
>   = Output =
> Next problem: output. We have two rather radically different
> output types: byte streams and GUI panels.
> A really stumbling block is that the object the output is
> written to  is volatile, it is likely to change with every
> rendering run, while the kind of renderer as well as the
> renderer specific configuration is more stable. This has
> profund implications for the API design.
> Choice 1:
> The interface is at the final output level. This means
> render()/run() methods for each of the classes:
>    render(OutputStream) // for PDF, MIF, PS, ...
>    run(UserAgent) // for AWT...
> We could add a print() method if necessary.
> Rationale for choosing the method names: render() means
> the input FO is rendered to a byte stream. Run() means,
> the UserAgent is started and the user can interact with it.
> The run() method will return if the user somehow ends the
> interaction process and shuts down the UserAgent. Do I
> interpret the current state correctly?
> This choice implies the renderer and any configuration
> data specific to the renderer has to be passed to the
> Driver (processor) through the Driver configuration
> methods. Because some renderers can be assumed to have
> a lot of renderer specific config data which warrants
> a structure imposed on it, I'm not very fond of the whole
> idea.
> Choice 2:
> The interface is the renderer. This means the renderer
> object has to be created by the user explicitely. The
> advantage is that the renderer configuration can be
> designed to fit the renderer rather than to be passed
> through a more generic interface at the Driver. Also,
> renderer configuration and the renderer independent
> processor configuration are better separated, which
> might be a good idea, in particular for people who want
> to render the same FO to several different output formats.
> In this case, a typical code snipped would look like
>    Processor p=new Processor(
>     new ProcessorConfiguration(new File("myconfig.xml")));
>    Renderer r=new PDFRenderer(
>     new PDFRendererConfiguration("cocoon:/myPDFconfig.xml")));
>    p.render(new StreamSource(new File("")),r);
> (I don't mind if the configuration is not passed to the
> constructor but to a configuration() method, this is just
> for illustration).
>   = Reuse =
> Last problem: reuse processors and renderers.
> The XSLT processor of the JAXP interface and presumably
> many XML parsers are throw away objects and not meant
> to be reused after the "work" method (transform(), parse())
> has been called.
> Choice 1:
> Make both processor and renderers throwaway objects. No
> reset() method. Advantage: the state after the rendering
> has ended can be retrieved as long as the objects are kepts.
> The most common use case for this which has been mentionedp
> on this list is inquiring the total number of pages rendered.
> There are other use cases for sure.
> I'm not sure how well this would fit into the avalon
> component model. Can someone enlighten me?
> Another consequence would be factory objects, where a user
> can conveniently prepare a preconfigured template so that
> repeated processor creation is simple and fast. Again, I'm
> not sure if this fits well in the model with separated
> processor and renderer, it is likely that the user will
> create lots of identically configured processor+renderer
> combinations.
> Choice 2:
> Make processor and/or renderers reusable by providing a
> reset() method. Again, in the model wit separated processor
> and renderers users may be confused by having to reset two
> objects. Another interesting question would be whether the
> renderer is kept after resetting the processor or not.
> In the first case, the renderer is a part of the processor
> configuration rather than a rendering parameter and should
> be passed to the constructor rather than to the rendering
> method.
> Choice 3:
> Reusable processor with auto-reset. The disadvantage is that
> no state is kept after rendering has ended. THere is still
> the possibly confusing problem whether a new renderer has
> to be used or the old renderer is kept.
>   = Caching =
> Caching is an interesting topic. It comes in two flavours:
> 1. Caching of stuff like images within a rendering run.
> 2. Caching across multiple rendering runs on reused objects
> The first is not only concerned with efficiency but also with
> predictability. Consider
>   <fo:page-sequence initial-page-number="1">
>      <fo:static-content>
>        <fo:external-graphic src=""/>
>     ...
>   <fo:page-sequence initial-page-number="20">
>      <fo:static-content>
>        <fo:external-graphic src=""/>
> Will the two page sequences feature the same or different
> pictures in the page header?
> XSLT explicitely says that within a transformer run, multiple
> access to the same URL results in the same content.
> The other interesting question is whether object reuse implies
> caching stuff like images across rendering runs. Whether this is
> useful depends on how often and how much stuff is shared. The
> use cases vary from rendering the same document several times
> to rendering documents sharing the same logo in the header to
> rendering documents at random.
> Choice 1:
> No caching at all, or a non-guaranteed caching. Risk reading
> sources multiple times, including possibly dynamically changing
> content.
> Perhaps we should leave the cache problem to another application
> layer. Cocoon appears to be quite good at it, no reinvention of the
> wheel necessary.
> Choice 2:
> Guarantee an URL is only read once within a rendering run. May imply
> memory problems.
> Choice 3:
> Expose caching across multiple renderings on a reused object.
> Needs an API for Cache control.
> (My opinion: not recommended).
>   = Conclusions =
> Ok, concrete proposals for the new interface, tentatively
> called Processor, for various combinations of the design
> changes regarding output abstraction and reuse. (I use
> j.x.t.Source for input, this does not mean I'm biased to
> this. Ok, I am :-) )
> 1. Output is physical. Throw away.
>   class Processor {
>      // default renderer, may adapt to output type
>     Processor()
>     // configureation includes renderer choice
>     Processor(Configuration)
>     run( Source s, UserAgent ua)
>     render( Source s, OutputStream o)
>   }
> 2. Alternative with separate configuration method
>   class Processor {
>     Processor()
>     configure(Configuration)
>     run( Source s, UserAgent ua)
>     render( Source s, OutputStream o)
>   }
> 3. Output is physical. Alternative for avoiding calling and
>   explicitly configured renderer with an improper output type
>   class Processor {
>     Processor(Source s, UserAgent ua)
>     Processor(Source s, OutputStream o)
>     Processor(Source s, UserAgent ua, Configuration)
>     Processor(Source s, OutputStream o, Configuration)
>     run()
>   }
> 4. Alternative with separate configuration method
>   class Processor {
>     Processor(Source s, UserAgent ua)
>     Processor(Source s, OutputStream o)
>     configure(Configuration)
>     render() // or run()
>   }
> 5-8. Add a reset() which resets both processor and renderer
>   to either of the altenatives above.
> 9. Output is Renderer. Throw away. Not well suited for Factory.
>   class Processor {
>     Processor()
>     Processor(Configuration)
>     render( Source s, Renderer r)
>   }
>   class PDFRenderer {
>    PDFRenderer(OutputStream o)
>    PDFRenderer(OutputStream o, Configuration)
>   }
> 10. Add reset() to 9.
> 11. Variants for Renderer output and Factory approach
>   omitted (look ugly). Add your own proposals
>   = Further activity =
> Well, I suppose there will be a consensus built:
> - Whether to expose
>   1. Avalon component interface only
>   2. Both Avalon and non-avalon interface
>   3. Non-Avalon interface only
> - Design variant for input channel
> - Design variant for output channel
> - Design variant for object reuse
> - Whether to provide a factory (if appropriate)
> I hope this happens within the next week.
> I will then post a detailed interface to the list. I hope
> someone will help me to avalonise this, if necessary.
> After the interface is voted on, I'll implement this,
> with the objective to have running code in august. The
> current interface should be deprecated but kept for a
> few maintenance releases.
> Is this ok?
> J.Pietschmann
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, email: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Reply via email to