[jira] Commented: (XERCESC-1816) Multi-character escape classes don't work correctly in regular expressions

David Bertoni (JIRA) Thu, 10 Jul 2008 14:44:55 -0700

    [ 
https://issues.apache.org/jira/browse/XERCESC-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612661#action_12612661
 ]


David Bertoni commented on XERCESC-1816:
----------------------------------------

It's supposed to match the NameChar production in the XML recommendation, right?

Looking at the code, it's clear things aren't even implemented:

Token* RegxParser::processBacksolidus_c() {

    XMLCh ch; //Must be in 0x0040-0x005F

    if (fOffset >= fStringLen
        || ((ch = fString[fOffset++]) & 0xFFE0) != 0x0040)
        ThrowXMLwithMemMgr(ParseException,XMLExcepts::Parser_Atom1, 
fMemoryManager);

    processNext();
        return fTokenFactory->createChar(ch - 0x40);
}


Token* RegxParser::processBacksolidus_C() {

        // REVISIT - Do we throw an exception - we do not want to throw too
        // many exceptions
    return 0;
}

Token* RegxParser::processBacksolidus_i() {

    processNext();
        return fTokenFactory->createChar(chLatin_i);
}


Token* RegxParser::processBacksolidus_I() {

        //Ditto
    return 0;
}


I'm not sure why we "do not want to throw too many exceptions," which seems 
better to me than pretending something's actually implemented when it's not.

I would guess calling fTokenFactory->getRange(fgXMLNameChar, false) would do 
the trick for "\c" and fTokenFactory->getRange(fgXMLNameChar, true) would work 
for "\C".  "\C" and "\I" are causing an infinite loop because the associated 
functions return 0 without calling processNext().

What a mess -- thanks for actually working on this.

> Multi-character escape classes don't work correctly in regular expressions
> --------------------------------------------------------------------------
>
>                 Key: XERCESC-1816
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1816
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Validating Parser (XML Schema)
>    Affects Versions: 2.8.0, 3.0.0
>            Reporter: John Snelson
>
> The regular expressions "\i", "\I", "\c" and "\C" do not work as specified in 
> the XML Schema specification:
> http://www.w3.org/TR/xmlschema-2/#nt-MultiCharEsc
> In fact, "\I" and "\C" cause an infinite loop during the parsing of the 
> regular expression, "\i" seems to only match the letter "i", and "\c" gives 
> the error:
> A character in U+0040-U+005f must follow '\c'.
> I'd be happy to attempt to fix this bug, but I need some guidance as to what 
> the code for "\c" is actually meant to be doing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (XERCESC-1816) Multi-character escape classes don't work correctly in regular expressions

Reply via email to