First, thanks to Neil for his answer (on Xerces-j-user list) which I
don't find anymore to quote appropriately.
Here is an attempt of a solution that looks pretty much to satisfy my
needs but for which I'd have some more comments on the quality of the
approach.
To re-parse (or parse) a single element (and its child content), it
seems sufficient to have the following information: the URL of the
document, the byte-positions (start and end) of the whole element, and
the byte-positions of the all the parents element-start declarations
(and be able to feed the corresponding closing elements).
This could be easily piped through a stream, reading only the necessary
bits and skipping the rest, thereby feeding to the parser only the
needed things.
Here's an example:
<a> <b><c>blop</c></b> <b id="b1"><c>blip</c></b> </a>
To reparse only the content of b of id "b1" I can then feed to the
parser:
<a><b id="b1"><c>blip</c></b></a>
thus avoiding the presumabily enormous first b element's content.
(note, this doesn't mention what the parsing is actually, feeding, I am
thinking of JDOM but one's free, just... sax events).
I see at least two applications of this:
- an xml source editor that has, say, a tree-view, could reparse much
less thereby being much more responsive (try jEdit's excellent xml-mode,
the parsing step is heavy!).
- to make poor-man's (read-only) database of xml-content, it would be
sufficient to build an index of the elements with an id which would then
be fed responding to a query
But is this good xml practice ?
I am clearly loosing the ability to apply full-validation (that is, I
could only revalidate the element's content, is schema exchangeable in
terms of root element like a DTD is ? relax-ng schemas ?)
Finally... to xerces makers/users: how do I get the byte position of an
element declaration I've just been handed to by the sax parser ?
Thanks.
Paul
On Jeudi, juillet 25, 2002, at 02:58 , Paul Libbrecht wrote:
> Although this request only about parsing, I think it looks to be
> general enough to be posted in this list.
>
> Here's a simple problem: one of our applications reads a row of XML
> documents, all using the same DTD declarations. If I understand well,
> at least from the SAX or JAXP interfaces, the parser will read the
> DTD(s) completely everytime.
> This looks like a real resource loss. Do some parsers, and preferably a
> standard, have a way to avoid this and re-use the same parsed DTD
> everytime ??
>
>
> A related fact is in the building of an XML editor where you offer the
> user the ability to edit the source code: what you would like is that
> the internal XML representation becomes updated quickly (ideally all
> the time). For this, however, we would need the parser to be able to
> only parse, say, the biggest element containing the changes.
> And for this, some more information should be kept, at least something
> similar to a stack of namespaces for each location.
---------------------------------------------------------------------
In case of troubles, e-mail: [EMAIL PROTECTED]
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]