Hi,

I managed to get my code working with the SAX parser this week, so at
the moment if you do something like:

  
saxParser.setFeature("http://xml.org/sax/features/unicode-normalization-checking";,
true);

and give the parser an xml file with unicode that is not normalized,
it will do this:

error: Parse error occurred - check-character-normalization-failure
org.xml.sax.SAXException: check-character-normalization-failure
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at sax.Counter.main(Unknown Source)

Obviously my next step is for more appropriate error handling.  At the
moment my component is just throwing an XNI exception.  I think I
should be converting this to a SAX exception and using a locator to
identify the appropriate lines in the xml file.  Is that right?  The
other class that I looked at was XMLParseException, but it doesn't
seem to have a way of specifying severity.

Thanks,
Richard



2009/6/10 Michael Glavassevich <[email protected]>:
> Hi Richard,
>
> Richard Kelly <[email protected]> wrote on 06/07/2009 01:16:07 PM:
>
>> Hi everyone,
>>
>> Just a bit of an update on my GSoC work.
>>
>> I've written an XNI document filter that checks for character
>> normalization.  I understand the process of inserting it into the
>> pipeline, but I am a bit unsure about the xni component manager and
>> getting the relevant features.  Do i need to register my component
>> with the component manager to receive the feature notifications or
>> does that automatically occur?
>
> Provided that you've registered your component with the component manager
> (e.g. calling addCommonComponent() in XML11Configuration) it will receive
> notifications when features and properties are changed and will also get a
> chance to read the configuration on reset. For performance reasons it's best
> to defer registering the component (and also creating it) until the
> character normalization feature is turned on.
>
>> I've also been looking around for relevant sections of existing code
>> that I will need to update. This is what I've come up with so far:
>>
>> - Update DOMNormalizer to normalize characters if the appropriate
>> feature is set.
>> - Update DOMParserImpl to allow DOM_NORMALIZE_CHARACTERS and
>> DOM_CHECK_CHAR_NORMALIZATION features
>> - Add character normalization flags to the normalization section of
>> DOMConfigurationImpl.
>> - Update AbstractSAXParser to allow the
>> UNICODE_NORMALIZATION_CHECKING_FEATURE.
>
> That sounds about right.
>
>> Let me know if you think of any glaring omissions.
>>
>> The LSSerializer class also has some attributes related to character
>> normalization [1].  However I believe the implementation in Xerces is
>> actually from the Xalan project which doesn't implement them.  Should
>> I be looking at adding character normalization support to their
>> project too?
>
> That's not something I'd considered when I proposed the project. I think it
> would be a good addition though probably something to leave until the end if
> you still have time (assuming you're interested in working on it).
>
>> Thanks,
>> Richard
>>
>> [1] http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSSerializer
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>
> Thanks.
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: [email protected]
> E-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to