[
https://issues.apache.org/jira/browse/XERCESC-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612661#action_12612661
]
David Bertoni commented on XERCESC-1816:
----------------------------------------
It's supposed to match the NameChar production in the XML recommendation, right?
Looking at the code, it's clear things aren't even implemented:
Token* RegxParser::processBacksolidus_c() {
XMLCh ch; //Must be in 0x0040-0x005F
if (fOffset >= fStringLen
|| ((ch = fString[fOffset++]) & 0xFFE0) != 0x0040)
ThrowXMLwithMemMgr(ParseException,XMLExcepts::Parser_Atom1,
fMemoryManager);
processNext();
return fTokenFactory->createChar(ch - 0x40);
}
Token* RegxParser::processBacksolidus_C() {
// REVISIT - Do we throw an exception - we do not want to throw too
// many exceptions
return 0;
}
Token* RegxParser::processBacksolidus_i() {
processNext();
return fTokenFactory->createChar(chLatin_i);
}
Token* RegxParser::processBacksolidus_I() {
//Ditto
return 0;
}
I'm not sure why we "do not want to throw too many exceptions," which seems
better to me than pretending something's actually implemented when it's not.
I would guess calling fTokenFactory->getRange(fgXMLNameChar, false) would do
the trick for "\c" and fTokenFactory->getRange(fgXMLNameChar, true) would work
for "\C". "\C" and "\I" are causing an infinite loop because the associated
functions return 0 without calling processNext().
What a mess -- thanks for actually working on this.
> Multi-character escape classes don't work correctly in regular expressions
> --------------------------------------------------------------------------
>
> Key: XERCESC-1816
> URL: https://issues.apache.org/jira/browse/XERCESC-1816
> Project: Xerces-C++
> Issue Type: Bug
> Components: Validating Parser (XML Schema)
> Affects Versions: 2.8.0, 3.0.0
> Reporter: John Snelson
>
> The regular expressions "\i", "\I", "\c" and "\C" do not work as specified in
> the XML Schema specification:
> http://www.w3.org/TR/xmlschema-2/#nt-MultiCharEsc
> In fact, "\I" and "\C" cause an infinite loop during the parsing of the
> regular expression, "\i" seems to only match the letter "i", and "\c" gives
> the error:
> A character in U+0040-U+005f must follow '\c'.
> I'd be happy to attempt to fix this bug, but I need some guidance as to what
> the code for "\c" is actually meant to be doing.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]