None of this is news to me, of course. I'm just trying to encourage precision of error reporting so we can address the problems most efficiently.
Note that namespaces are unavoidably in conflict with DTDs, which only went as far as reserving the colon character for their later use. Unfortunately the W3C didn't see fit to deprecate DTDs, and there is ancient code that still uses them. When discussing validity, we do need to be clear about whether we mean schema validity or DTD validity... And distinguish both from well-formedness, and from correctness. I'm just reminding folks to be careful about the use of these specific terms. "Invalid" might not be our problem to solve, if the output is otherwise correct. Not-well-formed is almost always our fault. Incorrect varies depending on why it's incorrect. ________________________________ From: Mukul Gandhi <muk...@apache.org> Sent: Tuesday, January 9, 2024 12:27:41 AM To: Joseph Kesselman <kesh...@alum.mit.edu> Cc: Stanimir Stamenkov <s7a...@netscape.net>; j-users@xalan.apache.org <j-users@xalan.apache.org> Subject: Re: supplementary characters emojis, etc turned ino surrogate pairs Hi Joseph, I've just felt like responding to below mentioned points by you. On Tue, Jan 9, 2024 at 5:28 AM Joseph Kesselman <kesh...@alum.mit.edu> wrote: >If an XML tool complains that a document is not valid that means it doesn't >match the DTD or schema that describes its expected structure, nor that > it >isn't correct XML. It's better to avoid using the term valid unless you mean >Valid in the sense XML does. I agree completely with, what you've written above. But IMHO, wanted to add little bit my perspective, as follows, 1) What it means, to have DTD (defined within the specs https://www.w3.org/TR/xml/ (A), and https://www.w3.org/TR/2006/REC-xml11-20060816/ (B)) or W3C XML Schema (referenced at https://www.w3.org/XML/Schema) validation, when using XML documents DTD (defined within the XML 1.0 (A) and 1.1 (B) specs) is also an XML Schema technology, similar to W3C XML Schema. But as an XML Schema language, W3C XML Schema (as compared to DTD) is more suitable for modern XML use cases. Most of the new XML applications these days use W3C XML Schema instead of DTD. But DTD has a concept of (custom) XML entity definitions and references (these are mainly like marco substitutions done by an XML parser), that W3C XML Schema doesn't have. The XML entity definitions and references within an XML document is not a XML document validation concept but its an XML concept. Therefore I feel, from the perspective of XML document validation, W3C XML Schema has all the features and much more than what DTD have. 2) XML namespaces technology (https://www.w3.org/TR/xml-names/) I'm just mentioning this, to say that XML namespaces are at an XML stack layer similar to W3C XML Schema. And of-course, we've XSLT language (XalanJ is all about XSLT and XPath), which is at same conceptual granularity as W3C XML Schema. -- Regards, Mukul Gandhi