Hi everyone, Just a quick update on the status my work. The past week I've been working on extending the normalize functions in DOM to support character normalization. I've also been implementing various changes based on the feedback from my initial code (thanks Michael!).
Richard 2009/7/13 Michael Glavassevich <[email protected]>: > Hi Richard, > > Richard Kelly <[email protected]> wrote on 07/12/2009 05:59:36 AM: > >> Hi everyone, >> >> I've made some progress on my character normalization, and I >> would like to get some feedback on my work to ensure I'm on the >> right path. > > I've had an opportunity to review your code. What you have so far is looking > really good. Great work! > >> I've uploaded the current state of my patches on JIRA [1]. > > I do have some suggestions for improvements which I'll attach to the JIRA > issue. > >> CharacterNormalizer.java is the new component that does the actual work. >> CharacterNormalizer.patch is all the changes to existing files that I >> needed to make. >> >> The relevant SAX [2] and DOM [3][4] character normalization features >> do appear to be working as intended with these changes (except for the >> tasks mentioned below). I've implemented it as an XNI component as we >> discussed and use two Xerces features to control this component and >> determined whether or not it gets added to the pipeline. >> >> Still on my to do list: >> - DOM Level 3 normalizeDocument() and Node.normalize() functions: >> These functions don't use the pipeline so I am planning to add code to >> directly call the component from within these functions. >> - Multiple character data stream events are not handled correctly: >> Since unicode characters can be larger than 16-bits they may get split >> up across multiple calls to 'characters' events. If this happens the >> character may not be normalized correctly. In order to avoid this, I >> plan to use a buffer within my component to keep track of characters >> that overlap these events. >> - A comprehensive set of tests to check that the features work as >> described in the standards. I've done basic testing for a number of >> cases (which it passed successfully) but obviously we would want >> something more comprehensive and also do some performance testing. >> >> If anyone would like to take a look and see if there are any obvious >> problems, that would be great. >> >> thanks, >> Richard >> >> [1] https://issues.apache.org/jira/browse/XERCESJ-1383 >> [2] http://www.saxproject.org/apidoc/org/xml/sax/package-summary. >> html#package_description >> [3] http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-check- >> character-normalization >> [4] http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter- >> normalize-characters >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] > > Thanks. > > Michael Glavassevich > XML Parser Development > IBM Toronto Lab > E-mail: [email protected] > E-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
