On Wednesday 26 October 2005 10:04 pm, boblq wrote:
> On Wednesday 26 October 2005 12:10 pm, Stewart Stremler wrote:
> > begin quoting boblq as of Wed, Oct 26, 2005 at 11:49:22AM -0700:
> > > On Tuesday 25 October 2005 08:59 am, Stewart Stremler wrote:
> > > > Writing your own XML parser that tries to put out meaningful error
> > > > messages is (a) seen as a waste of time as you're writing a redundant
> > > > parser and (b) is apt to be buggy and error-prone itself, making it
> > > > worse than what you have to deal with already.
> > >
> > > UH, duh. Isn't that one of the reasons why Open Source exists?
> >
> > Depends on who you are. Most folk want open-source because it results in
> > all software being (essentially) free-as-in-beer.
> >
> > > There are pretty decent parsers out there already, e.g.
> > >
> > > expat http://expat.sourceforge.net/
> > >
> > > SAX http://www.saxproject.org/
> > >
> > > You could contribute to these projects by improving the
> > > error reporting ...
> >
> > I haven't looked at these,
>
> > > Why would you need to write your own XML parser?
> >
> > Often, good error reporting isn't something that can be bolted on to a
> > system afterwards.
>
> How do you know about these when you have not looked at them ...
> often? Maybe just once you should look at the code instead of
> blindly citing your prejudices.
>
> Too much to ask I guess.
>
> BobLQ
Ok, I downloaded expat. Did the usual
./configure
make
make install
That took about 10 minutes.
The libs are in /usr/local/lib
where one would expect them to be.
I did not compile and run the examples but a glance
at the source code suggests they would likely work.
They certainly look easy enough to understand.
Looking in the docs I find these functions which look
like reasonable set of hooks tome on which you can
build whatever you want ...
------------------------------------------------------------
Parse position and error reporting functions
These are the functions you'll want to call when
the parse functions return XML_STATUS_ERROR
(a parse error has occurred), although the position
reporting functions are useful outside of errors. The
position reported is the byte position (in the original
document or entity encoding) of the first of the
sequence of characters that generated the current
event (or the error that caused the parse functions
to return XML_STATUS_ERROR.) The exceptions are
callbacks triggered by declarations in the document
prologue, in which case they exact position reported
is somewhere in the relevant markup, but not necessarily
as meaningful as for other events.
The position reporting functions are accurate only
outside of the DTD. In other words, they usually return
bogus information when called from within a DTD declaration
handler.
enum XML_Error XMLCALL
XML_GetErrorCode(XML_Parser p);
Return what type of error has occurred.
const XML_LChar * XMLCALL
XML_ErrorString(enum XML_Error code);
Return a string describing the error corresponding to code.
The code should be one of the enums that can be returned
from XML_GetErrorCode.
long XMLCALL
XML_GetCurrentByteIndex(XML_Parser p);
Return the byte offset of the position. This always corresponds
to the values returned by XML_GetCurrentLineNumber and
XML_GetCurrentColumnNumber.
int XMLCALL
XML_GetCurrentLineNumber(XML_Parser p);
Return the line number of the position. The first line is reported as 1.
int XMLCALL
XML_GetCurrentColumnNumber(XML_Parser p);
Return the offset, from the beginning of the current line,
of the position.
int XMLCALL
XML_GetCurrentByteCount(XML_Parser p);
Return the number of bytes in the current event. Returns 0 if the
event is inside a reference to an internal entity and for the end-tag
event for empty element tags (the later can be used to distinguish
empty-element tags from empty elements using separate start and
end tags).
const char * XMLCALL
XML_GetInputContext(XML_Parser p,
int *offset,
int *size);
Returns the parser's input buffer, sets the integer pointed at by
offset to the offset within this buffer of the current parse position,
and set the integer pointed at by size to the size of the returned buffer.
This should only be called from within a handler during an active parse
and the returned buffer should only be referred to from within the handler
that made the call. This input buffer contains the untranslated bytes of
the input.
Only a limited amount of context is kept, so if the event triggering a
call spans over a very large amount of input, the actual parse position
may be before the beginning of the buffer.
----------------------------------------------------------------------------------
Just my take after say thirty minutes.
I did use expat four or five years ago on a project and
it worked fine for me then. That was back when James Clark
first released it. I have no idea how much it has evolved
since then, but certainly a lot of people seem to use it
without much complaint.
So it goes,
BobLQ
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg