Hi all,

I'm finally finishing my exams this week, so I'll be able to dedicate
more time to this project.  I thought I'd give an update of where I'm
at.
So far, I've done this:
- Created a character normalization component that performs unicode
normalization.
- Modified XML11Configuration to handle the new features and to add
and remove the component from the pipeline when appropriate.
- Modified AbstractSAXParser to handle the SAX character normalization flags.
- Created basic test files to ensure the features are working as expected.
- Extended the character normalization component to deal with
composing characters.
- Updated the XML messages for character normalization errors
- Built the ICU4J component and updated build.xml to use it.

At the moment, I'm trying to map the 'relevant constructs' [1] in the
XML specfication to relevant Document Handler events.  These
constructs consist of:
   1.  The replacement text of all parsed entities
   2.  All text matching, in context, one of the following
productions:  CData, CharData, content, Name, Nmtoken.

After looking through the XML specification and correlating the above
with DocumentHandler functions [2], I've interpreted this to mean:
- normalize the text of 'characters' events (since this event matches
replacement text, CData, CharData and content productions)
- normalize QNames and XMLAttributes in any events where they occur
(this matches most Name and Nmtoken productions)
- normalize name parameters in doctypeDecl, startGeneralEntity,
processingInstruction, and endGeneralEntity events (additional
structures in which Name productions occur)

If anyone can think of other events in which these productions are
used, I would be most grateful if you could point them out.

Thanks for all your assistance so far, it has been a great help.
regards,
Richard


[1] http://www.w3.org/TR/xml11/#sec-normalization-checking
[2] 
http://xerces.apache.org/xerces2-j/javadocs/xni/org/apache/xerces/xni/XMLDocumentHandler.html


2009/6/16 Michael Glavassevich <[email protected]>:
> Hi Richard,
>
> The component you're looking for is the XMLErrorReporter [1]. It will take
> care of looking up and formatting the error messages (that you've added to
> the message file, e.g. XMLMessages.properties), creating the exception,
> supplying it with the right locator information and reporting the error to
> the user's error handler. You can obtain an instance of it from the
> XMLComponentManager by querying the
> "http://apache.org/xml/properties/internal/error-reporter"; property. You'll
> find plenty of examples of its usage around the Xerces source (in particular
> other classes in org.apache.xerces.impl).
>
> Thanks.
>
> [1]
> http://xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/impl/XMLErrorReporter.html
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: [email protected]
> E-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to