Tom, my comments embedded

My impression (until this discussion started) was that both are linear
in input, able to quit at any point, but the difference was three-fold
  (a) since SAX events are context-independent, and the logic of XML
often depends on what has gone before it, event handlers must often
effectively figure out the program state ("Oh, here's a number-tag;
what was I doing?") where XPP encodes much of the program state in
the program counter ("at this point, I'd better be reading a number.")
That gives XPP a little more speed, a little more readability; the
latter seems more important, really.
I'm seeing the same thing.  Just a little (insignificant) speed ...

  (b) since SAX is data-driven, but the basic language structure is
demand-driven, some possible designs are awkward: if you want to read
a part of a file and then process it and do something else (say, wait
for a user input) before you read the next part of the file, you must
put the SAX handling into a separate thread and suspend it. Again, XPP
wins a bit in speed, a bit in readability (and writability.
Why can't you wait within the callback method of a SAXHandler of the same thread ? Why a separate thread need to be created ?

  (c) since XPP can effectively skip over subtrees, there is at least
in principle the possibility of a sublinear implementation working on
tree-structured data, not necessarily DOM. I've no idea if anybody
actually does this, but you can certainly serialize a tree in which
each start-tag (for all or some start-tags) stores the position of
the matching end-tag, so that skipping is constant-time and data access
to a given tree-location is ...well, not quite logarithmic, but O(D*K)
where D would be the depth of the item you seek and K would be the
branching factor for the tree. (Of course you can do better with more
complex representations, all of which will be invisible to the program
calling the XPP parser.) Something like that, if I'm awake yet.
XPP can skip over those element that are "exactly once", but not "optional" or "multiple" which requires XPP to "read" and "check" before "next". Similarly, in SAX, your handler will check and return immediately. Again similar workload and insignificant difference.

For the start tag storing the position of end tag, do you need to parse through the whole document at least once. DOM approach ??

Rgds, Ricky

Reply via email to