[ https://issues.apache.org/jira/browse/XERCESC-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790159#comment-16790159 ]
Scott Cantor commented on XERCESC-2016: --------------------------------------- For possibly later reference, this change includes a very odd (to me) change to XMLScanner that causes the scanner to accept XML versions in the declaration that start with "1." but don't end in 0 or 1. I can't imagine why that would be something XML 1.0 5th ed would have required, so that seems very suspicious to me, but the net effect of that change is that it's possible to pass invalid versions past the scanner that only get picked up incidentally by the DOM code. No idea what happens in SAX. The diff is: http://svn.apache.org/viewvc/xerces/c/trunk/src/xercesc/internal/XMLScanner.cpp?r1=882548&r2=1517488 > XML 1.0 5th edition support > --------------------------- > > Key: XERCESC-2016 > URL: https://issues.apache.org/jira/browse/XERCESC-2016 > Project: Xerces-C++ > Issue Type: Improvement > Components: Non-Validating Parser > Environment: All > Reporter: Rob Cameron > Assignee: Alberto Massari > Priority: Major > Fix For: 3.2.0 > > Attachments: diff5e > > > Xerces-C currently applies XML 1.0 4th edition rules to name characters > in XML 1.0 documents. XML 1.0 5th edition permits a broader class > of name characters, based on those permitted in XML 1.1. > Proposal: that Xerces-C 3.2.0 be updated to include support for XML 1.0 > 5th edition. > Although our main work is with icXML, we've looked at making this change > in Xerces-C original code base so that icXML support for XML 1.0 5e is > compatible with us. > I'm not entirely sure that I've handled everything, but the following change > works in our test. The change plan is below and a svn diff file is > attached. > Here is the change plan. > ---------------------------------- > (1) internal/CharTypeTables.hpp > Rename gFirstNameChars1_1 to be gFirstNameChars > Rename gNameChars1_1 to be gNameChars > (2) util/XMLChar.cpp > (2a) > Update initCharFlagTable1_1() to use the gFirstNameChars, gNameChars > Update initCharFlagTable() to use the set-ups from initCharFlagTable1_1() > to define gNameCharMask, gNCNameCharMask, and gFirstNameCharMask. > // > // Name characters are special. A name is made up of a number of > // different tables and some special case characters. > // > initOneTable(gNameChars, gNameCharMask); > // > // Name characters are special. A name is made up of a number of > // different tables and some special case characters. > // > initOneTable(gNameChars, gNCNameCharMask); > gTmpCharTable[chColon] &= ~gNCNameCharMask; > // > // Then do the first name char > // > initOneTable(gFirstNameChars, gFirstNameCharMask); > (2b) #define NEED_TO_GEN_TABLE > compile and do a sample run of a Xerces app, generate table.out > (2c) Replace the XMLChar1_0::fgCharCharsTable1_0 definition pf XMLChar.cpp > with that from table.out. > (3) XMLChar.hpp > Modify XMLChar1_0::isFirstNameChar, XMLChar1_0::isFirstNCNameChar, > XMLChar1_0::isNameChar, XMLChar1_0::isNCNameChar > to each check for and allow characters in the #x10000-#xEFFFF range > else { > if ((toCheck >= 0xD800) && (toCheck <= 0xDB7F)) > if ((toCheck2 >= 0xDC00) && (toCheck2 <= 0xDFFF)) > return true; > } > (4) Modify XMLReader::getName and XMLReader::getNCName > to allow surrogate pairs in Names and NCNames > (i.e., use the version 1.1 logic for both 1.0 and 1.1). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org