Márk Petrényi created XERCESJ-1716:
--------------------------------------

             Summary: Validating XML against XSD is slow for long strings if 
pattern restrictions are defined, even if maxLength is restricted.
                 Key: XERCESJ-1716
                 URL: https://issues.apache.org/jira/browse/XERCESJ-1716
             Project: Xerces2-J
          Issue Type: Bug
            Reporter: Márk Petrényi
         Attachments: long_string.xml, unsafe.xsd, workaround.xsd

Validating XML against XSD is slow for long strings if pattern restrictions are 
defined, even if maxLength is restricted.

We have the following simple type defined in our xsd (unsafe.xsd):
{code:xml}
<xsd:simpleType name="SimpleText255NotBlankType">
 <xsd:annotation>
 <xsd:documentation xml:lang="en">String of maximum 255 characters, not 
blank</xsd:documentation>
 </xsd:annotation>
 <xsd:restriction base="xsd:string">
 <xsd:minLength value="1"/>
 <xsd:maxLength value="255"/>
 <xsd:pattern value=".*[^\s].*"/>
 </xsd:restriction>
</xsd:simpleType>
{code}
The problem is when a really long string (ca. 1000000 characters) is provided 
as a value in the input xml, we would assume that it is regarded invalid 
quickly because of the length. Actually the validation takes several minutes 
since the regex gets evaluated before the maxLength restriction.

We found a workaround for the issue if we define the simpleType this way 
(workaround.xsd):
{code:xml}
 <xsd:simpleType name="SimpleText255Type">
 <xsd:annotation>
 <xsd:documentation xml:lang="en">String of maximum 255 
characters</xsd:documentation>
 </xsd:annotation>
 <xsd:restriction base="xsd:string">
 <xsd:minLength value="1"/>
 <xsd:maxLength value="255"/>
 <xsd:pattern value=".\{1,255}"/>
 </xsd:restriction>
 </xsd:simpleType>
 <xsd:simpleType name="SimpleText255NotBlankType">
 <xsd:annotation>
 <xsd:documentation xml:lang="en">String of maximum 255 characters, not 
blank</xsd:documentation>
 </xsd:annotation>
 <xsd:restriction base="SimpleText255Type">
 <xsd:pattern value=".*[^\s].*"/>
 </xsd:restriction>
 </xsd:simpleType>
{code}
The workaround only works because the implementation of the XSSimpleType builds 
a Vector of the regex patterns and the {{.{1,255}}} pattern will be evaluated 
first and it fails relatively quickly thus the time consuming second regex wont 
be checked.

It would be great to have the regex pattern checked after validating other xsd 
restrictions (minLength, maxLength, etc..) or to have control over the 
validation ordering, thus avoiding unneccesseraly slow validations and the use 
of a workaround based on undocumented features.

I attached the xsd-s referenced above and an xml containing a long string 
value. The problem can be checked using the SourceValidator from Xerces2-J 
samples:

The original xsd with slow validation:
{code:java}
java jaxp.SourceValidator -a unsafe.xsd -i long_string.xml
{code}
The workaround xsd with normal run-time:
{code:java}
java jaxp.SourceValidator -a workaround.xsd -i long_string.xml
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-dev-h...@xerces.apache.org

Reply via email to