Hi Michael,

Thanks for your thorough review, I'll revise my proposal ASAP using
your feedback.


2009/3/30 Michael Glavassevich <[email protected]>:
>
> All XML characters are Unicode. If you were thinking of other character
> encodings besides UTF-*, these all get converted to Java chars on input so
> essentially Xerces is always working on UTF-16 and thus the normalization
> checker / normalizer will always see a "Unicode encoding form".
>

Ok, dealing with a single encoding should make it easier. I think I
got mixed up
when reading section 4.3.3 of the XML spec which mentions some other
encodings. [1]

>
> Probably something you've already realized but worth clarifying... The
> pipeline (XMLParserConfiguration [1]) is shared between the SAX and DOM (and
> perhaps one day StAX) parsers, so these features equally apply to the
> existing SAX XMLReader, JAXP SAXParser and DocumentBuilder. There's already
> a standard SAX feature defined for normalization checking [2]. We should
> probably define a Xerces' specific feature URI to cover the normalization
> function which could be set on the SAX parser, similar to the parameter
> defined in DOM Level 3 Core / Load & Save. For a DOM in memory the
> normalizing / normalization checking functions would be invoked by setting
> the parameters on the DOMConfiguration and calling normalizeDocument(). In
> addition to plugging in the XNI component here it would also involve
> updating the DOM with the normalized text. And when a DOM is loaded with an
> LSParser if the LSInput.certifiedText [3] flag is true, I believe the
> intention is that normalization processing is skipped so should have some
> way to bypass the normalization component (e.g. excluding it from the
> pipeline) when the input claims to be certified.
>

I had one question about another class called XML11Char which lets you check
if a character is a valid XML 1.1 character.  Should normalization checks play
any role in this validation?

Thanks,
Richard

[1] http://www.w3.org/TR/xml11/#charencoding

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to