Re: Alt-Design status: XML handling

Peter B. West Mon, 25 Nov 2002 05:56:55 -0800

Oleg Tkachenko wrote:

Peter B. West wrote:
Why is it easier for developers to use? Is it because the API is less complex or more easily understood? Not really. As you point out, the SAX API is not all that complex. The problem is that the processing model of SUX is completely inverted.

Well, I believe it's more philosophical question or a question of a programming style. push vs pull, imperative languages vs declarative languages etc etc etc ancient holy war. One likes to define rules aka sax handlers, another likes to weave a web from if statements, only to be able to control processing order ;) Both pull and push have pros and contras and it's a pity java still doesn't have a full-fledged pull parsing API (btw, James Clark is working on StAX[1], so it's a matter of time).

I don't believe is is only a matter of style. I think the detrimental effects of push for general programming are glaringly obvious. That I think, rather than catering for simple-minded developers, is what motivated MS' abandonment of SAX. I speak as a long-time anti-MS bigot.

 You may have come to like writing XSLT that way.
It's the only way to write non-hello-world stylesheets in xslt actually. Don't forget, xlst is a declarative language, so probably analogies with java are just irrelevant, they are different beasts. The question is what is good for the fo tree building stuff? Probably you right, pull is more suitable, but the bad thing is that real input is SAX stream hence we must translate push to pull (funny enough ms considers this task as unfeasible one in XMLReader documentation).

I haven't read the documentation, but it may be that they are referring to the infeasibility of moving code built around SAX to an XmlReader environment.

Hence next question is the cost of your interim buffer, what do you think could be its peak and average size?

At the moment it is more expensive than it need be; there is no event pool. I am writing one now. It's fairly trivial, as you can imagine. The buffer is implemented as a circular buffer, currently of 128 elements, but it has been set at 32, and 64 should be more than enough. The circular buffer places an upper limit on the size, and synchronizes (in a broad sense) the activities of producer (parser) and consumer (tree builder.)

parser:
until buffer full, write events to buffer
notify
wait

tree builder:
wait
until buffer empty, read events from buffer
notify

In the SAX model, the throttle on parser throughput is the downstream processing that is immediately triggered by the start and end events generated by the parser.

In the buffered model, the throttle is the circular buffer and the waits that occur on it.

Of course, as I have mentioned recently. And as I also said, the cost of parsing relative to the intensive downstream element processing of FOP is small.

If so, isn't it too early to optimize xml handling altogether? What would we benefit from moving from push to pull? Well, sort of automatic validation is a benefit indeed, but I'm not sure it's enough.

This is not an optimisation, but a fundamental design decision. It's all or nothing. See the comments about the feasibility of moving from one model to the other.

The whole question is context-dependent. If you are engaged in the peephole processing of SUX you may be obliged to use external validation. With top-down processing you have more choice, because your context is travelling with you.
btw, what about unexpected content model objects? Will this fail?
<fo:simple-page-master master-name="default">
    <fo:region-body/>
    <fo:block/>
</fo:simple-page-master>

Unexpected content models will throw an exception. How that is handled is another question. At the moment, while I am in a debugging phase, most exceptions just propagate up, but all the usual flexibility of the exception system is available for refinement.

Don't get me wrong here. I'm not saying that external validation is wrong, merely that with a pull model, the need is reduced. There may still be a strong case for it, but not as strong as with SUX.

You are right and that btw allows to make external validation optional and still to have reasonable level of validation for free.

[1] http://www.jcp.org/en/jsr/detail?id=173

It encourages me greatly that there is so much activity going on in this area. Especially interesting is the Xerces XNI XMLPullParserConfiguration Interface.

Peter
--
Peter B. West [EMAIL PROTECTED] http://www.powerup.com.au/~pbwest/
"Lord, to whom shall we go?"

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Re: Alt-Design status: XML handling

Reply via email to