Hi Richard, Richard Kelly <[email protected]> wrote on 03/31/2009 02:44:03 AM:
> Hi Michael, > > Thanks for your thorough review, I'll revise my proposal ASAP using > your feedback. > > > 2009/3/30 Michael Glavassevich <[email protected]>: > > > > All XML characters are Unicode. If you were thinking of other character > > encodings besides UTF-*, these all get converted to Java chars on input so > > essentially Xerces is always working on UTF-16 and thus the normalization > > checker / normalizer will always see a "Unicode encoding form". > > > > Ok, dealing with a single encoding should make it easier. I think I > got mixed up > when reading section 4.3.3 of the XML spec which mentions some other > encodings. [1] > > > > > Probably something you've already realized but worth clarifying... The > > pipeline (XMLParserConfiguration [1]) is shared between the SAX and DOM (and > > perhaps one day StAX) parsers, so these features equally apply to the > > existing SAX XMLReader, JAXP SAXParser and DocumentBuilder. There's already > > a standard SAX feature defined for normalization checking [2]. We should > > probably define a Xerces' specific feature URI to cover the normalization > > function which could be set on the SAX parser, similar to the parameter > > defined in DOM Level 3 Core / Load & Save. For a DOM in memory the > > normalizing / normalization checking functions would be invoked by setting > > the parameters on the DOMConfiguration and calling normalizeDocument(). In > > addition to plugging in the XNI component here it would also involve > > updating the DOM with the normalized text. And when a DOM is loaded with an > > LSParser if the LSInput.certifiedText [3] flag is true, I believe the > > intention is that normalization processing is skipped so should have some > > way to bypass the normalization component (e.g. excluding it from the > > pipeline) when the input claims to be certified. > > > > I had one question about another class called XML11Char which lets you check > if a character is a valid XML 1.1 character. Should normalization checks play > any role in this validation? XML11Char and its counterpart XMLChar are used for checking well-formedness: the set of rules which all XML documents must conform to (otherwise they're not XML). Well-formedness checking will have logically occurred before the normalization checker / normalizer sees the data. I wouldn't expect that you would need to call any of these methods again in that context. > Thanks, > Richard > > [1] http://www.w3.org/TR/xml11/#charencoding > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [email protected] E-mail: [email protected]
