Hi all, Wrt this topic, I made a bit of progress. The Xerces's XSD 1.1 XPath 2.0 regex implementation, delegates to the java's regex implementation.
As of now, the Xerces's XSD 1.1 XPath 2.0 regex implementation, is not fully compliant when regex flag "x" is used. With Xerces currently, when this flag is enabled, all whitespaces within regex are ignored (including, those within the regex character class expressions, which the XPath 2.0 spec recommends otherwise), and anything within regex starting with # till EOL is treated as comment and is ignored (which the XPath 2.0 spec, also doesn't recommend). Xerces inherits these features from the java's regex implementation. I could use the, java 1.8's regex source code, modify it (I had to restrict to java 1.7 instead of 1.8, to make this work) to fix the above XPath 2.0 regex compliance issues. I intend, to contribute these improvements to XercesJ XPath 2.0 sources. This shall require raising the Xerces java source level to 1.7 for XSD 1.1 sources (from the current 1.4). I also intend to, raise the Xerces java trunk source level to 1.7 as well, to make it consistent with the Xerces XSD 1.1 sources. Any, thoughts about these issues are welcome. On Sun, Nov 21, 2021 at 6:20 PM Mukul Gandhi <muk...@apache.org> wrote: > Hi all, > On the basis of my memory of few past discussions on this list (and on > few of the XercesJ jira discussions as well), I've the feeling that, > Xerces's XSD 1.1 XPath 2.0 regex implementation is little non compliant. I > wish to discuss that, a little bit here. > > The XPath 2.0 F&O regex requirements are specified at > https://www.w3.org/TR/xquery-operators/#regex-syntax [1]. > > My current analysis says that, Xerces's XSD 1.1 XPath 2.0 regex > implementation is compliant to a great extent to the above cited > specification [1]. The section "7.6.1.1 Flags" mentioned at [1], at the > bottom says following, > > x: If present, whitespace characters (#x9, #xA, #xD and #x20) in the > regular expression are removed prior to matching with one exception: > whitespace characters within character class expressions (charClassExpr) > are not removed. This flag can be used, for example, to break up long > regular expressions into readable lines. [2] > > We comply to point [2] cited above, except to following that is mentioned > at point [2]: "with one exception: whitespace characters within character > class expressions (charClassExpr) are not removed". Xerces's XPath 2.0 > regex implementation seems to remove whitespaces from within character > class expressions as well, when the flag "x" is present. > > To test my above mentioned claims, I wrote the following XML Schema 1.1 > example, > > XML document: > <?xml version="1.0"?> > <X>123</X> > > XML Schema 1.1 document, > <?xml version="1.0"?> > <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> > > <xs:element name="X"> > <xs:simpleType> > <xs:restriction base="xs:string"> > <xs:assertion test="matches($value, '[0- 9]{3}', 'x')"/> > </xs:restriction> > </xs:simpleType> > </xs:element> > > </xs:schema> > > For above XSD 1.1 validation example, Xerces reports a XSD 1.1 valid > schema assessment outcome. To my opinion, the xs:assertion mentioned above > should have failed (i.e should have returned false), since there's a space > character within [] (its a regex character class) on the mentioned regex. > > Other than the implementation deficiency mentioned above, I find that, > Xerces's XSD 1.1 XPath 2.0 processor's regex implementation is compliant to > the XPath 2.0 regex spec. > > Actually, Xerces's XSD 1.1 XPath 2.0 processor's regex implementation > (specifically, the behaviour of the XPath 2.0 regex flag "x"), behaves very > much like that of Java's regex support. > > I'd be happy, to continue discussion about this topic. > -- Regards, Mukul Gandhi