[ http://issues.apache.org/jira/browse/XERCESC-1226?page=all ] David Bertoni closed XERCESC-1226: ----------------------------------
> Parser reports bogus content when parsing > ----------------------------------------- > > Key: XERCESC-1226 > URL: http://issues.apache.org/jira/browse/XERCESC-1226 > Project: Xerces-C++ > Type: Bug > Components: SAX/SAX2 > Versions: Nightly build (please specify the date) > Environment: All platforms > Reporter: David Bertoni > Attachments: diff.txt, test1.xml > > When parsing the following document, the parser reports garbage characters. > <?xml version="1.0"?> > <subject>Research [𝔸]rticle</subject> > I traced this down to this function in XMLReader, starting on line 612: > inline bool XMLReader::isPlainContentChar(const XMLCh toCheck) > { > return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0); > } > Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in > fgCharCharsTable indicate it's not plain content. This causes the parser to > misbehave badly, and deliver broken character data, including unpaired low > surrogates. > When I used the debugger, and returned "true" from this function, rather than > false, the parser delivered the correct character data. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
