[
https://issues.apache.org/jira/browse/XERCESC-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613521#action_12613521
]
David Bertoni commented on XERCESC-1816:
----------------------------------------
http://perldoc.perl.org/perlre.html#Regular-Expressions
According to the spec, \c supports matching a single control code, which
explains the existing code.
\C supports matching a single byte, even in Unicode mode:
" \C Match a single C char (octet) even under Unicode.
NOTE: breaks up characters into their UTF-8 bytes,
so you may end up with malformed pieces of UTF-8.
Unsupported in lookbehind."
Why don't we just report an error if the expression contains \i or \I in
non-schema mode.
"The escape sequence '{0}' is supported only in XML Schema mode."
> Multi-character escape classes don't work correctly in regular expressions
> --------------------------------------------------------------------------
>
> Key: XERCESC-1816
> URL: https://issues.apache.org/jira/browse/XERCESC-1816
> Project: Xerces-C++
> Issue Type: Bug
> Components: Validating Parser (XML Schema)
> Affects Versions: 2.8.0, 3.0.0
> Reporter: John Snelson
>
> The regular expressions "\i", "\I", "\c" and "\C" do not work as specified in
> the XML Schema specification:
> http://www.w3.org/TR/xmlschema-2/#nt-MultiCharEsc
> In fact, "\I" and "\C" cause an infinite loop during the parsing of the
> regular expression, "\i" seems to only match the letter "i", and "\c" gives
> the error:
> A character in U+0040-U+005f must follow '\c'.
> I'd be happy to attempt to fix this bug, but I need some guidance as to what
> the code for "\c" is actually meant to be doing.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]