Tom, my comments embedded
My impression (until this discussion started) was that both are linear in input, able to quit at any point, but the difference was three-fold (a) since SAX events are context-independent, and the logic of XML often depends on what has gone before it, event handlers must often effectively figure out the program state ("Oh, here's a number-tag; what was I doing?") where XPP encodes much of the program state in the program counter ("at this point, I'd better be reading a number.") That gives XPP a little more speed, a little more readability; the latter seems more important, really.
I'm seeing the same thing. Just a little (insignificant) speed ...
Why can't you wait within the callback method of a SAXHandler of the same thread ? Why a separate thread need to be created ?(b) since SAX is data-driven, but the basic language structure is demand-driven, some possible designs are awkward: if you want to read a part of a file and then process it and do something else (say, wait for a user input) before you read the next part of the file, you must put the SAX handling into a separate thread and suspend it. Again, XPP wins a bit in speed, a bit in readability (and writability.
XPP can skip over those element that are "exactly once", but not "optional" or "multiple" which requires XPP to "read" and "check" before "next". Similarly, in SAX, your handler will check and return immediately. Again similar workload and insignificant difference.(c) since XPP can effectively skip over subtrees, there is at least in principle the possibility of a sublinear implementation working on tree-structured data, not necessarily DOM. I've no idea if anybody actually does this, but you can certainly serialize a tree in which each start-tag (for all or some start-tags) stores the position of the matching end-tag, so that skipping is constant-time and data access to a given tree-location is ...well, not quite logarithmic, but O(D*K) where D would be the depth of the item you seek and K would be the branching factor for the tree. (Of course you can do better with more complex representations, all of which will be invisible to the program calling the XPP parser.) Something like that, if I'm awake yet.
For the start tag storing the position of end tag, do you need to parse through the whole document at least once. DOM approach ??
Rgds, Ricky