[ 
https://issues.apache.org/jira/browse/XERCESJ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968107#comment-16968107
 ] 

Mukul Gandhi commented on XERCESJ-1716:
---------------------------------------

out of my curiosity, I tried your unsafe.xsd schema with the type 
"SimpleText255NotBlankType" written as following,

<xs:simpleType name="SimpleText255NotBlankType">
     <xs:annotation>
         <xs:documentation xml:lang="en">String of maximum 255 characters, not 
blank</xs:documentation>
     </xs:annotation>
     <xs:restriction base="xs:string">
         <xs:minLength value="1"/>
         <xs:maxLength value="255"/>
         <xs:assertion test="not(contains($value, ' '))"/>
      </xs:restriction>
 </xs:simpleType>

(I'm using an XSD 1.1 <assertion> facet instead of <pattern>)

This performs fast with the .xml document you've posted.

Therefore, this seems to be another workaround for your use case.

> Validating XML against XSD is slow for long strings if pattern restrictions 
> are defined, even if maxLength is restricted.
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1716
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1716
>             Project: Xerces2-J
>          Issue Type: Bug
>            Reporter: Márk Petrényi
>            Priority: Major
>         Attachments: long_string.xml, unsafe.xsd, workaround.xsd
>
>
> Validating XML against XSD is slow for long strings if pattern restrictions 
> are defined, even if maxLength is restricted.
> We have the following simple type defined in our xsd (unsafe.xsd):
> {code:xml}
> <xsd:simpleType name="SimpleText255NotBlankType">
>  <xsd:annotation>
>  <xsd:documentation xml:lang="en">String of maximum 255 characters, not 
> blank</xsd:documentation>
>  </xsd:annotation>
>  <xsd:restriction base="xsd:string">
>  <xsd:minLength value="1"/>
>  <xsd:maxLength value="255"/>
>  <xsd:pattern value=".*[^\s].*"/>
>  </xsd:restriction>
> </xsd:simpleType>
> {code}
> The problem is when a really long string (ca. 1000000 characters) is provided 
> as a value in the input xml, we would assume that it is regarded invalid 
> quickly because of the length. Actually the validation takes several minutes 
> since the regex gets evaluated before the maxLength restriction.
> We found a workaround for the issue if we define the simpleType this way 
> (workaround.xsd):
> {code:xml}
>  <xsd:simpleType name="SimpleText255Type">
>  <xsd:annotation>
>  <xsd:documentation xml:lang="en">String of maximum 255 
> characters</xsd:documentation>
>  </xsd:annotation>
>  <xsd:restriction base="xsd:string">
>  <xsd:minLength value="1"/>
>  <xsd:maxLength value="255"/>
>  <xsd:pattern value=".\{1,255}"/>
>  </xsd:restriction>
>  </xsd:simpleType>
>  <xsd:simpleType name="SimpleText255NotBlankType">
>  <xsd:annotation>
>  <xsd:documentation xml:lang="en">String of maximum 255 characters, not 
> blank</xsd:documentation>
>  </xsd:annotation>
>  <xsd:restriction base="SimpleText255Type">
>  <xsd:pattern value=".*[^\s].*"/>
>  </xsd:restriction>
>  </xsd:simpleType>
> {code}
> The workaround only works because the implementation of the XSSimpleType 
> builds a Vector of the regex patterns and the {{.{1,255}}} pattern will be 
> evaluated first and it fails relatively quickly thus the time consuming 
> second regex wont be checked.
> It would be great to have the regex pattern checked after validating other 
> xsd restrictions (minLength, maxLength, etc..) or to have control over the 
> validation ordering, thus avoiding unneccesseraly slow validations and the 
> use of a workaround based on undocumented features.
> I attached the xsd-s referenced above and an xml containing a long string 
> value. The problem can be checked using the SourceValidator from Xerces2-J 
> samples:
> The original xsd with slow validation:
> {code:java}
> java jaxp.SourceValidator -a unsafe.xsd -i long_string.xml
> {code}
> The workaround xsd with normal run-time:
> {code:java}
> java jaxp.SourceValidator -a workaround.xsd -i long_string.xml
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-dev-h...@xerces.apache.org

Reply via email to