[ https://issues.apache.org/jira/browse/XERCESJ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968107#comment-16968107 ]
Mukul Gandhi commented on XERCESJ-1716: --------------------------------------- out of my curiosity, I tried your unsafe.xsd schema with the type "SimpleText255NotBlankType" written as following, <xs:simpleType name="SimpleText255NotBlankType"> <xs:annotation> <xs:documentation xml:lang="en">String of maximum 255 characters, not blank</xs:documentation> </xs:annotation> <xs:restriction base="xs:string"> <xs:minLength value="1"/> <xs:maxLength value="255"/> <xs:assertion test="not(contains($value, ' '))"/> </xs:restriction> </xs:simpleType> (I'm using an XSD 1.1 <assertion> facet instead of <pattern>) This performs fast with the .xml document you've posted. Therefore, this seems to be another workaround for your use case. > Validating XML against XSD is slow for long strings if pattern restrictions > are defined, even if maxLength is restricted. > ------------------------------------------------------------------------------------------------------------------------- > > Key: XERCESJ-1716 > URL: https://issues.apache.org/jira/browse/XERCESJ-1716 > Project: Xerces2-J > Issue Type: Bug > Reporter: Márk Petrényi > Priority: Major > Attachments: long_string.xml, unsafe.xsd, workaround.xsd > > > Validating XML against XSD is slow for long strings if pattern restrictions > are defined, even if maxLength is restricted. > We have the following simple type defined in our xsd (unsafe.xsd): > {code:xml} > <xsd:simpleType name="SimpleText255NotBlankType"> > <xsd:annotation> > <xsd:documentation xml:lang="en">String of maximum 255 characters, not > blank</xsd:documentation> > </xsd:annotation> > <xsd:restriction base="xsd:string"> > <xsd:minLength value="1"/> > <xsd:maxLength value="255"/> > <xsd:pattern value=".*[^\s].*"/> > </xsd:restriction> > </xsd:simpleType> > {code} > The problem is when a really long string (ca. 1000000 characters) is provided > as a value in the input xml, we would assume that it is regarded invalid > quickly because of the length. Actually the validation takes several minutes > since the regex gets evaluated before the maxLength restriction. > We found a workaround for the issue if we define the simpleType this way > (workaround.xsd): > {code:xml} > <xsd:simpleType name="SimpleText255Type"> > <xsd:annotation> > <xsd:documentation xml:lang="en">String of maximum 255 > characters</xsd:documentation> > </xsd:annotation> > <xsd:restriction base="xsd:string"> > <xsd:minLength value="1"/> > <xsd:maxLength value="255"/> > <xsd:pattern value=".\{1,255}"/> > </xsd:restriction> > </xsd:simpleType> > <xsd:simpleType name="SimpleText255NotBlankType"> > <xsd:annotation> > <xsd:documentation xml:lang="en">String of maximum 255 characters, not > blank</xsd:documentation> > </xsd:annotation> > <xsd:restriction base="SimpleText255Type"> > <xsd:pattern value=".*[^\s].*"/> > </xsd:restriction> > </xsd:simpleType> > {code} > The workaround only works because the implementation of the XSSimpleType > builds a Vector of the regex patterns and the {{.{1,255}}} pattern will be > evaluated first and it fails relatively quickly thus the time consuming > second regex wont be checked. > It would be great to have the regex pattern checked after validating other > xsd restrictions (minLength, maxLength, etc..) or to have control over the > validation ordering, thus avoiding unneccesseraly slow validations and the > use of a workaround based on undocumented features. > I attached the xsd-s referenced above and an xml containing a long string > value. The problem can be checked using the SourceValidator from Xerces2-J > samples: > The original xsd with slow validation: > {code:java} > java jaxp.SourceValidator -a unsafe.xsd -i long_string.xml > {code} > The workaround xsd with normal run-time: > {code:java} > java jaxp.SourceValidator -a workaround.xsd -i long_string.xml > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org