[ https://issues.apache.org/jira/browse/XERCESJ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445809#comment-17445809 ]
Mukul Gandhi edited comment on XERCESJ-1716 at 11/20/21, 10:51 AM: ------------------------------------------------------------------- I got a chance, to look at the original bug report with this thread. Instead of, <xs:simpleType name="SimpleText255NotBlankType"> <xs:annotation> <xs:documentation xml:lang="en">String of maximum 255 characters, not blank</xs:documentation> </xs:annotation> <xs:restriction base="xs:string"> <xs:minLength value="1"/> <xs:maxLength value="255"/> <xs:pattern value=".*[^\s].*"/> </xs:restriction> </xs:simpleType> We can write (and that runs very fast on the provided XML document long_string.xml), <xs:simpleType name="SimpleText255NotBlankType"> <xs:annotation> <xs:documentation xml:lang="en">String of maximum 255 characters, not blank</xs:documentation> </xs:annotation> <xs:restriction base="xs:string"> <xs:pattern value="[^\s]{1,255}"/> </xs:restriction> </xs:simpleType> I think that, Xerces XSD processor in general, should not evaluate xs:minLength, xs:maxLength facets before xs:pattern facet. The XSD specification doesn't prescribe, any such guideline, and implementers can determine order of XSD facet evaluation within a simple type as implementation dependent. was (Author: mukul_gandhi): I got a chance, to look at the original bug report with this thread. Instead of, <xs:simpleType name="SimpleText255NotBlankType"> <xs:annotation> <xs:documentation xml:lang="en">String of maximum 255 characters, not blank</xs:documentation> </xs:annotation> <xs:restriction base="xs:string"> <xs:minLength value="1"/> <xs:maxLength value="255"/> <xs:pattern value=".*[^\s].*"/> </xs:restriction> </xs:simpleType> We can write (and that runs very fast on the provided XML document long_string.xml), <xs:simpleType name="SimpleText255NotBlankType"> <xs:annotation> <xs:documentation xml:lang="en">String of maximum 255 characters, not blank</xs:documentation> </xs:annotation> <xs:restriction base="xs:string"> <xs:pattern value="[^\s]\{1,255}"/> </xs:restriction> </xs:simpleType> I think that, Xerces XSD processor in general, should not evaluate xs:minLength, xs:maxLength facets before xs:pattern facet. The XSD specification doesn't prescribe, any such guideline, and implementers can determine order of XSD facet evaluation within a simple type as implementation dependent. > Validating XML against XSD is slow for long strings if pattern restrictions > are defined, even if maxLength is restricted. > ------------------------------------------------------------------------------------------------------------------------- > > Key: XERCESJ-1716 > URL: https://issues.apache.org/jira/browse/XERCESJ-1716 > Project: Xerces2-J > Issue Type: Improvement > Reporter: Márk Petrényi > Assignee: Mukul Gandhi > Priority: Major > Attachments: long_string.xml, unsafe.xsd, workaround.xsd > > > Validating XML against XSD is slow for long strings if pattern restrictions > are defined, even if maxLength is restricted. > We have the following simple type defined in our xsd (unsafe.xsd): > {code:xml} > <xsd:simpleType name="SimpleText255NotBlankType"> > <xsd:annotation> > <xsd:documentation xml:lang="en">String of maximum 255 characters, not > blank</xsd:documentation> > </xsd:annotation> > <xsd:restriction base="xsd:string"> > <xsd:minLength value="1"/> > <xsd:maxLength value="255"/> > <xsd:pattern value=".*[^\s].*"/> > </xsd:restriction> > </xsd:simpleType> > {code} > The problem is when a really long string (ca. 1000000 characters) is provided > as a value in the input xml, we would assume that it is regarded invalid > quickly because of the length. Actually the validation takes several minutes > since the regex gets evaluated before the maxLength restriction. > We found a workaround for the issue if we define the simpleType this way > (workaround.xsd): > {code:xml} > <xsd:simpleType name="SimpleText255Type"> > <xsd:annotation> > <xsd:documentation xml:lang="en">String of maximum 255 > characters</xsd:documentation> > </xsd:annotation> > <xsd:restriction base="xsd:string"> > <xsd:minLength value="1"/> > <xsd:maxLength value="255"/> > <xsd:pattern value=".\{1,255}"/> > </xsd:restriction> > </xsd:simpleType> > <xsd:simpleType name="SimpleText255NotBlankType"> > <xsd:annotation> > <xsd:documentation xml:lang="en">String of maximum 255 characters, not > blank</xsd:documentation> > </xsd:annotation> > <xsd:restriction base="SimpleText255Type"> > <xsd:pattern value=".*[^\s].*"/> > </xsd:restriction> > </xsd:simpleType> > {code} > The workaround only works because the implementation of the XSSimpleType > builds a Vector of the regex patterns and the {{.{1,255}}} pattern will be > evaluated first and it fails relatively quickly thus the time consuming > second regex wont be checked. > It would be great to have the regex pattern checked after validating other > xsd restrictions (minLength, maxLength, etc..) or to have control over the > validation ordering, thus avoiding unneccesseraly slow validations and the > use of a workaround based on undocumented features. > I attached the xsd-s referenced above and an xml containing a long string > value. The problem can be checked using the SourceValidator from Xerces2-J > samples: > The original xsd with slow validation: > {code:java} > java jaxp.SourceValidator -a unsafe.xsd -i long_string.xml > {code} > The workaround xsd with normal run-time: > {code:java} > java jaxp.SourceValidator -a workaround.xsd -i long_string.xml > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org