None of this is news to me, of course. I'm just trying to encourage precision 
of error reporting so we can address the problems most efficiently.



Note that namespaces are unavoidably in conflict with DTDs, which only went as 
far as reserving the colon character for their later use. Unfortunately the W3C 
didn't see fit to deprecate DTDs, and there is ancient code that still uses 
them.

When discussing validity, we do need to be clear about whether we mean schema 
validity or DTD validity... And distinguish both from well-formedness, and from 
correctness.

I'm just reminding folks to be careful about the use of these specific terms. 
"Invalid" might not be our problem to solve, if the output is otherwise 
correct. Not-well-formed is almost always our fault. Incorrect varies depending 
on why it's incorrect.


________________________________
From: Mukul Gandhi <muk...@apache.org>
Sent: Tuesday, January 9, 2024 12:27:41 AM
To: Joseph Kesselman <kesh...@alum.mit.edu>
Cc: Stanimir Stamenkov <s7a...@netscape.net>; j-users@xalan.apache.org 
<j-users@xalan.apache.org>
Subject: Re: supplementary characters emojis, etc turned ino surrogate pairs

Hi Joseph,
   I've just felt like responding to below mentioned points by you.

On Tue, Jan 9, 2024 at 5:28 AM Joseph Kesselman <kesh...@alum.mit.edu> wrote:
>If an XML tool complains that a document is not valid that means it doesn't 
>match the DTD or schema that describes its expected structure, nor that > it 
>isn't correct XML. It's better to avoid using the term valid unless you mean 
>Valid in the sense XML does.

I agree completely with, what you've written above. But IMHO, wanted
to add little bit my perspective, as follows,

1) What it means, to have DTD (defined within the specs
https://www.w3.org/TR/xml/ (A), and
https://www.w3.org/TR/2006/REC-xml11-20060816/ (B)) or W3C XML Schema
(referenced at https://www.w3.org/XML/Schema) validation, when using
XML documents

DTD (defined within the XML 1.0 (A) and 1.1 (B) specs) is also an XML
Schema technology, similar to W3C XML Schema. But as an XML Schema
language, W3C XML Schema (as compared to DTD) is more suitable for
modern XML use cases. Most of the new XML applications these days use
W3C XML Schema instead of DTD.

But DTD has a concept of (custom) XML entity definitions and
references (these are mainly like marco substitutions done by an XML
parser), that W3C XML Schema doesn't have. The XML entity definitions
and references within an XML document is not a XML document validation
concept but its an XML concept. Therefore I feel, from the perspective
of XML document validation, W3C XML Schema has all the features and
much more than what DTD have.

2) XML namespaces technology (https://www.w3.org/TR/xml-names/)

I'm just mentioning this, to say that XML namespaces are at an XML
stack layer similar to W3C XML Schema.

And of-course, we've XSLT language (XalanJ is all about XSLT and
XPath), which is at same conceptual granularity as W3C XML Schema.


--
Regards,
Mukul Gandhi

Reply via email to