Alt-Design status: XML handling

Peter B. West Wed, 20 Nov 2002 22:25:36 -0800

Fop-devs,

Here is a update on the front-end of alt-design, for those of you who may not be aware of what I have been doing. Attached is a broad overview diagram of the approach I have taken to XML parsing and FO tree building. I had been somewhat apprehensive about this approach, not because I thought it was in any way wrong, but because I seemed to be taking a very isolated path. My motivation can be summed up in the slogan SAX SUX. I couldn't understand why anyone would persist with it for any complex tasks, e.g. FOP. Some months ago, I stumbled across this article from XML.com: http://www.xml.com/pub/a/2001/12/19/jjc.html>, which includes the following:
<quote>
Moving on to talk about the conventional ways XML was processed in programs, Clark protested that the current widespread APIs (SAX and DOM) made processing XML either too hard or too error-prone. He observed that these first generation APIs now lagged behind recent W3C Recommendations: Namespace support was "grafted on," and they are misaligned with the XML Infoset.

Echoing sentiments recently expressed in this publication, Clark said that SAX, though efficient, was very hard to use, and that DOM had obvious limitations due to the requirement that the document being processed be in memory. He suggested that what was needed was a standard "pull API," one that efficiently allowed random access to XML documents. Clark praised the XML APIs from Microsoft's C#/.NET platform in this regard, adding that Java could learn much from .NET: "Just because it comes from Microsoft, it's not necessarily bad."
</quote>

I haven't followed up on what these APIs do, but had a quick look today.
http://www.softartisans.com/softartisans/netpaper-skonnard-best01.html includes this:

<quote>
The most common streaming API used today is the Simple API for XML (SAX). Microsoft introduced support for SAX in MSXML 3.0 but then determined that the SAX-based programming model was too obscure and unnecessarily difficult for the majority of their developer community. So to provide .NET developers with a more intuitive alternative, Microsoft introduced a new streaming API through the XmlReader class hierarchy.

The main difference between XmlReader and SAX is that the former allows the client to control the flow of execution by pulling the nodes from the stream one at a time while with the latter, the processor is in control, pushing the nodes back to the client one node at a time. This significant difference makes XmlReader much easier to use for most Microsoft developers that are used to working with firehose (forward-only/read-only) cursors in ADO.
</quote>

The above gives a feel for the XmlReader API, and nicely describes the impact of the buffering I have introduced between SAX and the FO tree builder. As the attached diagram shows, the SAX "push" stream is converted to a "pull" stream by the simple expedient of buffering. To the client, the buffering presents a series of get and expect methods, which reverses the direction of control. The Fo tree builder is now in charge of events, rather than being a SAX-slave.

(There may be some echoes here of Avalon's use of the Inversion Of Control pattern {there's that word} but the little I have read of IoC does not allow me to draw any conclusions.)

This change has dramatic effects on the structure and clarity of the code, all, IMO, for the better. Take, for instance, this code from FoSimplePageMaster.java.

public FoSimplePageMaster
(FOTree foTree, FONode parent, FoXMLEvent event)
throws TreeException, FOPException
{
super(foTree, FObjectNames.SIMPLE_PAGE_MASTER, parent, event,
FONode.LAYOUT_SET, sparsePropsMap, sparseIndices);
// Process regions here
FoXMLEvent regionEv;
if ((regionEv = xmlevents.expectStartElement
(FObjectNames.REGION_BODY,
XMLEvent.DISCARD_W_SPACE))
== null)
throw new FOPException
("No fo:region-body in simple-page-master: "
+ getMasterName());
// Process region-body
regionBody = new FoRegionBody(foTree, this, regionEv);
xmlevents.getEndElement(regionEv);

// Remaining regions are optional
if ((regionEv = xmlevents.expectStartElement
(FObjectNames.REGION_BEFORE,
XMLEvent.DISCARD_W_SPACE))
!= null)
{
regionBefore = new FoRegionBefore(foTree, this, regionEv);
xmlevents.getEndElement(regionEv);
}

if ((regionEv = xmlevents.expectStartElement
(FObjectNames.REGION_AFTER,
XMLEvent.DISCARD_W_SPACE))
!= null)
{
regionAfter = new FoRegionAfter(foTree, this, regionEv);
xmlevents.getEndElement(regionEv);
}

if ((regionEv = xmlevents.expectStartElement
(FObjectNames.REGION_START,
XMLEvent.DISCARD_W_SPACE))
!= null)
{
regionStart = new FoRegionStart(foTree, this, regionEv);
xmlevents.getEndElement(regionEv);
}

if ((regionEv = xmlevents.expectStartElement
(FObjectNames.REGION_END,
XMLEvent.DISCARD_W_SPACE))
!= null)
{
regionEnd = new FoRegionEnd(foTree, this, regionEv);
xmlevents.getEndElement(regionEv);
}

// Clean up the build environment
makeSparsePropsSet();
}

Note the calls to xmlevents.expectStartElement() and xmlevents.getEndElement(). xmlevents is the buffer instance.

Note also that the structure of the code does its own validation. It generates the simple-page-master subtree according to the content model

(region-body,region-before?,region-after?,region-start?,region-end?)

Interestingly, when this code is run against docs/examples/tables/background.fo, containing

<fo:simple-page-master
margin-right="1.5cm"
margin-left="1.5cm"
margin-bottom="2cm"
margin-top="1cm"
page-width="21cm"
page-height="29.7cm"
master-name="first">
<fo:region-before extent="1cm"/>
<fo:region-body margin-top="1cm"/>
<fo:region-after extent="1.5cm"/>
</fo:simple-page-master>

it throws an exception, as it should.

STATUS:

The XML pull buffering has been working for some considerable time. I have simply been extending the get/expect methods on top of the simpler methods as I have found a requirement for them in building the FO tree.

The (non-instantiable) property classes and the FO classes are in place, and are sufficiently advanced to allow for the testing of the FO tree building process. I am currently working through the fo examples. Once that is done, I will have to fill in any blanks.

TODO:

Corresponding properties.
Complete PropertyValue validation.
Comprehensive testing.
Complete provision for markers in all FO classes.
Other things I can't think of now.

More later on the relationship between the FO tree builder and the layout engine.

Peter
--
Peter B. West [EMAIL PROTECTED] http://www.powerup.com.au/~pbwest/
"Lord, to whom shall we go?"

<<inline: alt.design.png>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Alt-Design status: XML handling

Reply via email to