Hi foppers, I know I should provide code instead of talking, but then... The current FOP API suffers from a variety of deficiencies - unexpected statefulness (most horribly embodied in XSLTInputHandler) - weak abstraction of input and output channels - incomplete separation of abstraction levels. - cruft :-)
Some points I think should be followed on design of a new and hopefully better API: - Atomic initialisation. After creating a processor, it should be ready to run. Mandatory parametrisation data should be passed either to the constructor or the method(s) running the formatting process, everything else should be initialised from sensible defaults. - No file names, anywhere. Strings representing ressources are always URLs, on the command line, in the config file everywhere. In the API, use java.io.File if files are deemed necessary. - No baseDir. Define a baseURL concept. Pass all URL through a resolver. - Better abstraction of input and output channels. Whether only an avalon component API is exposed or whether there is an avalon-free API and a separate avalon component is a matter of taste. In either case, I'd like to have the possiblity to run a FOP core without access to external config *files*, this means I can create a new Driver() and can pass all config data by java properties, service definitions and by using a user written Configuration class passed to the Driver.configure() method for everything too complex to be passed as properties and services (i.e. user font config). A FOP default Configuration class could read a system and a user config file. From what I've gathered from Avalon this is already implemented this way there. However, I'm not sure, and I'm not dogmatic about this. The problem I have is the design space for abstracting input and output channels. = Input = For input, we have the javax.xml.transform.Source stuff which provides a nice unified encapsulation of SAX, DOM and serialised XML streams as well as SAX and DOM itself. The nice part about the j.x.t.Source stuff is that it shields the user from as much of the lower level XML stuff as possible, in particular from setting up a parser in the common case of having serialised XML as input. Design choice 1: Use j.x.t.Source as FOP input. Implement a o.a.f.stream.XSLTStreamSource as a j.x.t.s.StreamSource subclass for providing XSLT power. (see end of message for an interface proposal) Choice 2: Provide SAX and DOM as input (getContentHandler() and render(DOM)) Choice 3: Provide (more precise: expose) both. Redundant, but, well... = Output = Next problem: output. We have two rather radically different output types: byte streams and GUI panels. A really stumbling block is that the object the output is written to is volatile, it is likely to change with every rendering run, while the kind of renderer as well as the renderer specific configuration is more stable. This has profund implications for the API design. Choice 1: The interface is at the final output level. This means render()/run() methods for each of the classes: render(OutputStream) // for PDF, MIF, PS, ... run(UserAgent) // for AWT... We could add a print() method if necessary. Rationale for choosing the method names: render() means the input FO is rendered to a byte stream. Run() means, the UserAgent is started and the user can interact with it. The run() method will return if the user somehow ends the interaction process and shuts down the UserAgent. Do I interpret the current state correctly? This choice implies the renderer and any configuration data specific to the renderer has to be passed to the Driver (processor) through the Driver configuration methods. Because some renderers can be assumed to have a lot of renderer specific config data which warrants a structure imposed on it, I'm not very fond of the whole idea. Choice 2: The interface is the renderer. This means the renderer object has to be created by the user explicitely. The advantage is that the renderer configuration can be designed to fit the renderer rather than to be passed through a more generic interface at the Driver. Also, renderer configuration and the renderer independent processor configuration are better separated, which might be a good idea, in particular for people who want to render the same FO to several different output formats. In this case, a typical code snipped would look like Processor p=new Processor( new ProcessorConfiguration(new File("myconfig.xml"))); Renderer r=new PDFRenderer( new PDFRendererConfiguration("cocoon:/myPDFconfig.xml"))); p.render(new StreamSource(new File("foo.fo")),r); (I don't mind if the configuration is not passed to the constructor but to a configuration() method, this is just for illustration). = Reuse = Last problem: reuse processors and renderers. The XSLT processor of the JAXP interface and presumably many XML parsers are throw away objects and not meant to be reused after the "work" method (transform(), parse()) has been called. Choice 1: Make both processor and renderers throwaway objects. No reset() method. Advantage: the state after the rendering has ended can be retrieved as long as the objects are kepts. The most common use case for this which has been mentionedp on this list is inquiring the total number of pages rendered. There are other use cases for sure. I'm not sure how well this would fit into the avalon component model. Can someone enlighten me? Another consequence would be factory objects, where a user can conveniently prepare a preconfigured template so that repeated processor creation is simple and fast. Again, I'm not sure if this fits well in the model with separated processor and renderer, it is likely that the user will create lots of identically configured processor+renderer combinations. Choice 2: Make processor and/or renderers reusable by providing a reset() method. Again, in the model wit separated processor and renderers users may be confused by having to reset two objects. Another interesting question would be whether the renderer is kept after resetting the processor or not. In the first case, the renderer is a part of the processor configuration rather than a rendering parameter and should be passed to the constructor rather than to the rendering method. Choice 3: Reusable processor with auto-reset. The disadvantage is that no state is kept after rendering has ended. THere is still the possibly confusing problem whether a new renderer has to be used or the old renderer is kept. = Caching = Caching is an interesting topic. It comes in two flavours: 1. Caching of stuff like images within a rendering run. 2. Caching across multiple rendering runs on reused objects The first is not only concerned with efficiency but also with predictability. Consider <fo:page-sequence initial-page-number="1"> <fo:static-content> <fo:external-graphic src="http://dynamic.com/curr-time.gif"/> ... <fo:page-sequence initial-page-number="20"> <fo:static-content> <fo:external-graphic src="http://dynamic.com/curr-time.gif"/> Will the two page sequences feature the same or different pictures in the page header? XSLT explicitely says that within a transformer run, multiple access to the same URL results in the same content. The other interesting question is whether object reuse implies caching stuff like images across rendering runs. Whether this is useful depends on how often and how much stuff is shared. The use cases vary from rendering the same document several times to rendering documents sharing the same logo in the header to rendering documents at random. Choice 1: No caching at all, or a non-guaranteed caching. Risk reading sources multiple times, including possibly dynamically changing content. Perhaps we should leave the cache problem to another application layer. Cocoon appears to be quite good at it, no reinvention of the wheel necessary. Choice 2: Guarantee an URL is only read once within a rendering run. May imply memory problems. Choice 3: Expose caching across multiple renderings on a reused object. Needs an API for Cache control. (My opinion: not recommended). = Conclusions = Ok, concrete proposals for the new interface, tentatively called Processor, for various combinations of the design changes regarding output abstraction and reuse. (I use j.x.t.Source for input, this does not mean I'm biased to this. Ok, I am :-) ) 1. Output is physical. Throw away. class Processor { // default renderer, may adapt to output type Processor() // configureation includes renderer choice Processor(Configuration) run( Source s, UserAgent ua) render( Source s, OutputStream o) } 2. Alternative with separate configuration method class Processor { Processor() configure(Configuration) run( Source s, UserAgent ua) render( Source s, OutputStream o) } 3. Output is physical. Alternative for avoiding calling and explicitly configured renderer with an improper output type class Processor { Processor(Source s, UserAgent ua) Processor(Source s, OutputStream o) Processor(Source s, UserAgent ua, Configuration) Processor(Source s, OutputStream o, Configuration) run() } 4. Alternative with separate configuration method class Processor { Processor(Source s, UserAgent ua) Processor(Source s, OutputStream o) configure(Configuration) render() // or run() } 5-8. Add a reset() which resets both processor and renderer to either of the altenatives above. 9. Output is Renderer. Throw away. Not well suited for Factory. class Processor { Processor() Processor(Configuration) render( Source s, Renderer r) } class PDFRenderer { PDFRenderer(OutputStream o) PDFRenderer(OutputStream o, Configuration) } 10. Add reset() to 9. 11. Variants for Renderer output and Factory approach omitted (look ugly). Add your own proposals = Further activity = Well, I suppose there will be a consensus built: - Whether to expose 1. Avalon component interface only 2. Both Avalon and non-avalon interface 3. Non-Avalon interface only - Design variant for input channel - Design variant for output channel - Design variant for object reuse - Whether to provide a factory (if appropriate) I hope this happens within the next week. I will then post a detailed interface to the list. I hope someone will help me to avalonise this, if necessary. After the interface is voted on, I'll implement this, with the objective to have running code in august. The current interface should be deprecated but kept for a few maintenance releases. Is this ok? J.Pietschmann --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]