Hi Michael, Thanks for your thorough review, I'll revise my proposal ASAP using your feedback.
2009/3/30 Michael Glavassevich <[email protected]>: > > All XML characters are Unicode. If you were thinking of other character > encodings besides UTF-*, these all get converted to Java chars on input so > essentially Xerces is always working on UTF-16 and thus the normalization > checker / normalizer will always see a "Unicode encoding form". > Ok, dealing with a single encoding should make it easier. I think I got mixed up when reading section 4.3.3 of the XML spec which mentions some other encodings. [1] > > Probably something you've already realized but worth clarifying... The > pipeline (XMLParserConfiguration [1]) is shared between the SAX and DOM (and > perhaps one day StAX) parsers, so these features equally apply to the > existing SAX XMLReader, JAXP SAXParser and DocumentBuilder. There's already > a standard SAX feature defined for normalization checking [2]. We should > probably define a Xerces' specific feature URI to cover the normalization > function which could be set on the SAX parser, similar to the parameter > defined in DOM Level 3 Core / Load & Save. For a DOM in memory the > normalizing / normalization checking functions would be invoked by setting > the parameters on the DOMConfiguration and calling normalizeDocument(). In > addition to plugging in the XNI component here it would also involve > updating the DOM with the normalized text. And when a DOM is loaded with an > LSParser if the LSInput.certifiedText [3] flag is true, I believe the > intention is that normalization processing is skipped so should have some > way to bypass the normalization component (e.g. excluding it from the > pipeline) when the input claims to be certified. > I had one question about another class called XML11Char which lets you check if a character is a valid XML 1.1 character. Should normalization checks play any role in this validation? Thanks, Richard [1] http://www.w3.org/TR/xml11/#charencoding --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
