I think we should bump the platform to Java 8 if that gives us the regex
support we need.

This is a new feature so it's ok to modernize. Java 8 is almost EOL as well
anyway. Updating to only Java 7 feels like trading in one set of
restrictions for another.

If you were to do this back backport to Java 7, my guess is that you'd want
to drop it when we go to Java 8 eventually.

Gary

On Tue, Nov 23, 2021, 07:57 Mukul Gandhi <muk...@apache.org> wrote:

> Hi all,
>     Wrt this topic, I made a bit of progress.
>
> The Xerces's XSD 1.1 XPath 2.0 regex implementation, delegates to the
> java's regex implementation.
>
> As of now, the Xerces's XSD 1.1 XPath 2.0 regex implementation, is not
> fully compliant when regex flag "x" is used. With Xerces currently, when
> this flag is enabled, all whitespaces within regex are ignored (including,
> those within the regex character class expressions, which the XPath 2.0
> spec recommends otherwise), and anything within regex starting with # till
> EOL is treated as comment and is ignored (which the XPath 2.0 spec, also
> doesn't recommend). Xerces inherits these features from the java's regex
> implementation.
>
> I could use the, java 1.8's regex source code, modify it (I had to
> restrict to java 1.7 instead of 1.8, to make this work) to fix the above
> XPath 2.0 regex compliance issues.
>
> I intend, to contribute these improvements to XercesJ XPath 2.0 sources.
> This shall require raising the Xerces java source level to 1.7 for XSD 1.1
> sources (from the current 1.4). I also intend to, raise the Xerces java
> trunk source level to 1.7 as well, to make it consistent with the Xerces
> XSD 1.1 sources.
>
> Any, thoughts about these issues are welcome.
>
> On Sun, Nov 21, 2021 at 6:20 PM Mukul Gandhi <muk...@apache.org> wrote:
>
>> Hi all,
>>     On the basis of my memory of few past discussions on this list (and
>> on few of the XercesJ jira discussions as well), I've the feeling that,
>> Xerces's XSD 1.1 XPath 2.0 regex implementation is little non compliant. I
>> wish to discuss that, a little bit here.
>>
>> The XPath 2.0 F&O regex requirements are specified at
>> https://www.w3.org/TR/xquery-operators/#regex-syntax [1].
>>
>> My current analysis says that, Xerces's XSD 1.1 XPath 2.0 regex
>> implementation is compliant to a great extent to the above cited
>> specification [1]. The section "7.6.1.1 Flags" mentioned at [1], at the
>> bottom says following,
>>
>> x: If present, whitespace characters (#x9, #xA, #xD and #x20) in the
>> regular expression are removed prior to matching with one exception:
>> whitespace characters within character class expressions (charClassExpr)
>> are not removed. This flag can be used, for example, to break up long
>> regular expressions into readable lines. [2]
>>
>> We comply to point [2] cited above, except to following that is mentioned
>> at point [2]: "with one exception: whitespace characters within character
>> class expressions (charClassExpr) are not removed". Xerces's XPath 2.0
>> regex implementation seems to remove whitespaces from within character
>> class expressions as well, when the flag "x" is present.
>>
>> To test my above mentioned claims, I wrote the following XML Schema 1.1
>> example,
>>
>> XML document:
>> <?xml version="1.0"?>
>> <X>123</X>
>>
>> XML Schema 1.1 document,
>> <?xml version="1.0"?>
>> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";>
>>
>>     <xs:element name="X">
>>        <xs:simpleType>
>>          <xs:restriction base="xs:string">
>>            <xs:assertion test="matches($value, '[0- 9]{3}', 'x')"/>
>>          </xs:restriction>
>>        </xs:simpleType>
>>     </xs:element>
>>
>> </xs:schema>
>>
>> For above XSD 1.1 validation example, Xerces reports a XSD 1.1 valid
>> schema assessment outcome. To my opinion, the xs:assertion mentioned above
>> should have failed (i.e should have returned false), since there's a space
>> character within [] (its a regex character class) on the mentioned regex.
>>
>> Other than the implementation deficiency mentioned above, I find that,
>> Xerces's XSD 1.1 XPath 2.0 processor's regex implementation is compliant to
>> the XPath 2.0 regex spec.
>>
>> Actually, Xerces's XSD 1.1 XPath 2.0 processor's regex implementation
>> (specifically, the behaviour of the XPath 2.0 regex flag "x"), behaves very
>> much like that of Java's regex support.
>>
>> I'd be happy, to continue discussion about this topic.
>>
>
>
> --
> Regards,
> Mukul Gandhi
>

Reply via email to