David Lichteblau <[EMAIL PROTECTED]> writes: > Quoting Peter K.Lee (saint-jz+Qvnm4FdNWk0Htik3J/[EMAIL PROTECTED]): >> > * What does the "char-sets" column mean? It says "UTF-8 w/o Unicode" for >> > cxml; I can't make sense of that. >> Me neither. :) But that is how it is reported in the cxml page. > > I take that to mean that the CXML documentation is not elaborate enough > on this. Do you have a suggestion where in the documentation to write > more about it? What kind of information would you have liked to see?
Yes, what I would've liked to see was a section with Features (or capabilities, or highlights, whatever) as well as a section with Limitations (things it does NOT do). I must have grabbed the "UTF-8 w/o Unicode" from the Recent Changes section, erroneously condensing the part about "Lisp Implementations w/o Unicode support" into a garbage phrase. :) >> Other parsers make cursory notes about character sets it supports as >> well. I'd be happy to update the column to make it more sane if >> someone can shed some light on what it really means... > > Well, partly I was asking what the column was meant to be about. > > UTF-8 is not a character set, it's an encoding. Ah, thanks for the clarification. I will correct the column to "encodings". What is a character set then, and does it play any role in XML parsing? > * The "character set" XML parsers use is, by definition, Unicode. > Every XML parser must deal with Unicode. > > * A different question is which "encodings" a parser supports. Now, every > parser is required by the spec to support both UTF-8 and and UTF-16. > If it doesn't, that's a topic for a bugs section, not so much for a > features comparision. In a feature comparison, it would be interesting > to know which *other* encodings a parser supports. > > For example, CXML seems to support iso-8859-n and koi8-r (hmm, whatever > that is :-)) in addition to UTF-8 and UTF-16. Where can I find info about all the encodings that CXML supports (or should I assume that the above list is complete)? > (Ideally, an XML parser in Lisp [an a Unicode-ware implementation] > would support all external formats supported by the host Lisp, but > that can be a portability issue.) > > * Yet another question is which encodings the serializer supports. > > For example, CXML has built-in support for UTF-8 serializer (even on > non-unicode aware Lisps) and leaves all other encodings to the host > Lisp. (Prepend your own XML declarations and use a character stream > sink with the external-format you need.) I can't say I completely grok the parser encoding vs. serializer encoding. How would you recommend I categorize the encoding support in evaluating the XML parser? >> > * Somehow I'd like a column "Makes an effort to conform to the >> > standards". AFAIK only CL-XML and CXML qualify for a "yes" there. >> >> I'm not exactly sure how to quantify "making an effort to conform to >> the standards". It appears that XML syntax is a particular standard >> that all the XML parsing libraries conform to, and the rest of the > > Well, there is a indeed standard for XML 1.0 > http://www.w3.org/TR/REC-xml/ > and there is a very good test suite for that standard > http://www.w3.org/XML/Test/ > >> "techniques" of parsing vary widely. If the XML parser does not do >> validation, > > No, there are validating and non-validating parsers. The XML test suite > has tests for both of them. It's fine for a parser to state that it > doesn't support validation, it is still a conforming non-validating > parser. > >> or provide the W3C DOM API, does that mean it is not >> making an effort to conform to the standards? > > A XML parser does not have to implement DOM by any means. It is > definitely an optional feature. If it does claim to implement it, it > should pass the DOM test suite, however. > > Same for XML namespaces. That is also an optional, separate > specification and covered by specially tagged tests in the XML > conformance test suite. Soon I hope to have actual conformance testing taken place on each XML parser library and have the results reflected in the comparison report. In the meantime, I hope the omission of conformance is forgivable. I do value the importance of that factor in properly evaluating the completeness of the parsers. -Peter _______________________________________________ Gardeners mailing list [email protected] http://www.lispniks.com/mailman/listinfo/gardeners
