Hi all, I'm finally finishing my exams this week, so I'll be able to dedicate more time to this project. I thought I'd give an update of where I'm at. So far, I've done this: - Created a character normalization component that performs unicode normalization. - Modified XML11Configuration to handle the new features and to add and remove the component from the pipeline when appropriate. - Modified AbstractSAXParser to handle the SAX character normalization flags. - Created basic test files to ensure the features are working as expected. - Extended the character normalization component to deal with composing characters. - Updated the XML messages for character normalization errors - Built the ICU4J component and updated build.xml to use it.
At the moment, I'm trying to map the 'relevant constructs' [1] in the XML specfication to relevant Document Handler events. These constructs consist of: 1. The replacement text of all parsed entities 2. All text matching, in context, one of the following productions: CData, CharData, content, Name, Nmtoken. After looking through the XML specification and correlating the above with DocumentHandler functions [2], I've interpreted this to mean: - normalize the text of 'characters' events (since this event matches replacement text, CData, CharData and content productions) - normalize QNames and XMLAttributes in any events where they occur (this matches most Name and Nmtoken productions) - normalize name parameters in doctypeDecl, startGeneralEntity, processingInstruction, and endGeneralEntity events (additional structures in which Name productions occur) If anyone can think of other events in which these productions are used, I would be most grateful if you could point them out. Thanks for all your assistance so far, it has been a great help. regards, Richard [1] http://www.w3.org/TR/xml11/#sec-normalization-checking [2] http://xerces.apache.org/xerces2-j/javadocs/xni/org/apache/xerces/xni/XMLDocumentHandler.html 2009/6/16 Michael Glavassevich <[email protected]>: > Hi Richard, > > The component you're looking for is the XMLErrorReporter [1]. It will take > care of looking up and formatting the error messages (that you've added to > the message file, e.g. XMLMessages.properties), creating the exception, > supplying it with the right locator information and reporting the error to > the user's error handler. You can obtain an instance of it from the > XMLComponentManager by querying the > "http://apache.org/xml/properties/internal/error-reporter" property. You'll > find plenty of examples of its usage around the Xerces source (in particular > other classes in org.apache.xerces.impl). > > Thanks. > > [1] > http://xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/impl/XMLErrorReporter.html > > Michael Glavassevich > XML Parser Development > IBM Toronto Lab > E-mail: [email protected] > E-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
