[ 
https://issues.apache.org/jira/browse/XERCESJ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445809#comment-17445809
 ] 

Mukul Gandhi edited comment on XERCESJ-1716 at 11/20/21, 10:51 AM:
-------------------------------------------------------------------

I got a chance, to look at the original bug report with this thread.

Instead of,

<xs:simpleType name="SimpleText255NotBlankType">
        <xs:annotation>
            <xs:documentation xml:lang="en">String of maximum 255 characters, 
not blank</xs:documentation>
        </xs:annotation>
        <xs:restriction base="xs:string">
            <xs:minLength value="1"/>
            <xs:maxLength value="255"/>
            <xs:pattern value=".*[^\s].*"/>            
        </xs:restriction>
</xs:simpleType>

We can write (and that runs very fast on the provided XML document 
long_string.xml),

<xs:simpleType name="SimpleText255NotBlankType">
        <xs:annotation>
            <xs:documentation xml:lang="en">String of maximum 255 characters, 
not blank</xs:documentation>
        </xs:annotation>
        <xs:restriction base="xs:string">            
            <xs:pattern value="[^\s]{1,255}"/>
        </xs:restriction>
</xs:simpleType>

I think that, Xerces XSD processor in general, should not evaluate 
xs:minLength, xs:maxLength facets before xs:pattern facet. The XSD 
specification doesn't prescribe, any such guideline, and implementers can 
determine order of XSD facet evaluation within a simple type as implementation 
dependent.


was (Author: mukul_gandhi):
I got a chance, to look at the original bug report with this thread.

Instead of,

<xs:simpleType name="SimpleText255NotBlankType">
        <xs:annotation>
            <xs:documentation xml:lang="en">String of maximum 255 characters, 
not blank</xs:documentation>
        </xs:annotation>
        <xs:restriction base="xs:string">
            <xs:minLength value="1"/>
            <xs:maxLength value="255"/>
            <xs:pattern value=".*[^\s].*"/>            
        </xs:restriction>
</xs:simpleType>

We can write (and that runs very fast on the provided XML document 
long_string.xml),

<xs:simpleType name="SimpleText255NotBlankType">
        <xs:annotation>
            <xs:documentation xml:lang="en">String of maximum 255 characters, 
not blank</xs:documentation>
        </xs:annotation>
        <xs:restriction base="xs:string">            
            <xs:pattern value="[^\s]\{1,255}"/>
        </xs:restriction>
</xs:simpleType>

I think that, Xerces XSD processor in general, should not evaluate 
xs:minLength, xs:maxLength facets before xs:pattern facet. The XSD 
specification doesn't prescribe, any such guideline, and implementers can 
determine order of XSD facet evaluation within a simple type as implementation 
dependent.

> Validating XML against XSD is slow for long strings if pattern restrictions 
> are defined, even if maxLength is restricted.
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1716
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1716
>             Project: Xerces2-J
>          Issue Type: Improvement
>            Reporter: Márk Petrényi
>            Assignee: Mukul Gandhi
>            Priority: Major
>         Attachments: long_string.xml, unsafe.xsd, workaround.xsd
>
>
> Validating XML against XSD is slow for long strings if pattern restrictions 
> are defined, even if maxLength is restricted.
> We have the following simple type defined in our xsd (unsafe.xsd):
> {code:xml}
> <xsd:simpleType name="SimpleText255NotBlankType">
>  <xsd:annotation>
>  <xsd:documentation xml:lang="en">String of maximum 255 characters, not 
> blank</xsd:documentation>
>  </xsd:annotation>
>  <xsd:restriction base="xsd:string">
>  <xsd:minLength value="1"/>
>  <xsd:maxLength value="255"/>
>  <xsd:pattern value=".*[^\s].*"/>
>  </xsd:restriction>
> </xsd:simpleType>
> {code}
> The problem is when a really long string (ca. 1000000 characters) is provided 
> as a value in the input xml, we would assume that it is regarded invalid 
> quickly because of the length. Actually the validation takes several minutes 
> since the regex gets evaluated before the maxLength restriction.
> We found a workaround for the issue if we define the simpleType this way 
> (workaround.xsd):
> {code:xml}
>  <xsd:simpleType name="SimpleText255Type">
>  <xsd:annotation>
>  <xsd:documentation xml:lang="en">String of maximum 255 
> characters</xsd:documentation>
>  </xsd:annotation>
>  <xsd:restriction base="xsd:string">
>  <xsd:minLength value="1"/>
>  <xsd:maxLength value="255"/>
>  <xsd:pattern value=".\{1,255}"/>
>  </xsd:restriction>
>  </xsd:simpleType>
>  <xsd:simpleType name="SimpleText255NotBlankType">
>  <xsd:annotation>
>  <xsd:documentation xml:lang="en">String of maximum 255 characters, not 
> blank</xsd:documentation>
>  </xsd:annotation>
>  <xsd:restriction base="SimpleText255Type">
>  <xsd:pattern value=".*[^\s].*"/>
>  </xsd:restriction>
>  </xsd:simpleType>
> {code}
> The workaround only works because the implementation of the XSSimpleType 
> builds a Vector of the regex patterns and the {{.{1,255}}} pattern will be 
> evaluated first and it fails relatively quickly thus the time consuming 
> second regex wont be checked.
> It would be great to have the regex pattern checked after validating other 
> xsd restrictions (minLength, maxLength, etc..) or to have control over the 
> validation ordering, thus avoiding unneccesseraly slow validations and the 
> use of a workaround based on undocumented features.
> I attached the xsd-s referenced above and an xml containing a long string 
> value. The problem can be checked using the SourceValidator from Xerces2-J 
> samples:
> The original xsd with slow validation:
> {code:java}
> java jaxp.SourceValidator -a unsafe.xsd -i long_string.xml
> {code}
> The workaround xsd with normal run-time:
> {code:java}
> java jaxp.SourceValidator -a workaround.xsd -i long_string.xml
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-dev-h...@xerces.apache.org

Reply via email to