[ https://issues.apache.org/jira/browse/XERCESJ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968274#comment-16968274 ]
Márk Petrényi commented on XERCESJ-1716: ---------------------------------------- Hi, thank you for your suggestion. In our project we need to actually unmarshall the xml with jaxb (I simplified the use case in the ticket description to concentrate on the root cause of the slow processing). Unfortunately jaxb does not support XSD 1.1, so it's not a viable solution for us. > Validating XML against XSD is slow for long strings if pattern restrictions > are defined, even if maxLength is restricted. > ------------------------------------------------------------------------------------------------------------------------- > > Key: XERCESJ-1716 > URL: https://issues.apache.org/jira/browse/XERCESJ-1716 > Project: Xerces2-J > Issue Type: Bug > Reporter: Márk Petrényi > Priority: Major > Attachments: long_string.xml, unsafe.xsd, workaround.xsd > > > Validating XML against XSD is slow for long strings if pattern restrictions > are defined, even if maxLength is restricted. > We have the following simple type defined in our xsd (unsafe.xsd): > {code:xml} > <xsd:simpleType name="SimpleText255NotBlankType"> > <xsd:annotation> > <xsd:documentation xml:lang="en">String of maximum 255 characters, not > blank</xsd:documentation> > </xsd:annotation> > <xsd:restriction base="xsd:string"> > <xsd:minLength value="1"/> > <xsd:maxLength value="255"/> > <xsd:pattern value=".*[^\s].*"/> > </xsd:restriction> > </xsd:simpleType> > {code} > The problem is when a really long string (ca. 1000000 characters) is provided > as a value in the input xml, we would assume that it is regarded invalid > quickly because of the length. Actually the validation takes several minutes > since the regex gets evaluated before the maxLength restriction. > We found a workaround for the issue if we define the simpleType this way > (workaround.xsd): > {code:xml} > <xsd:simpleType name="SimpleText255Type"> > <xsd:annotation> > <xsd:documentation xml:lang="en">String of maximum 255 > characters</xsd:documentation> > </xsd:annotation> > <xsd:restriction base="xsd:string"> > <xsd:minLength value="1"/> > <xsd:maxLength value="255"/> > <xsd:pattern value=".\{1,255}"/> > </xsd:restriction> > </xsd:simpleType> > <xsd:simpleType name="SimpleText255NotBlankType"> > <xsd:annotation> > <xsd:documentation xml:lang="en">String of maximum 255 characters, not > blank</xsd:documentation> > </xsd:annotation> > <xsd:restriction base="SimpleText255Type"> > <xsd:pattern value=".*[^\s].*"/> > </xsd:restriction> > </xsd:simpleType> > {code} > The workaround only works because the implementation of the XSSimpleType > builds a Vector of the regex patterns and the {{.{1,255}}} pattern will be > evaluated first and it fails relatively quickly thus the time consuming > second regex wont be checked. > It would be great to have the regex pattern checked after validating other > xsd restrictions (minLength, maxLength, etc..) or to have control over the > validation ordering, thus avoiding unneccesseraly slow validations and the > use of a workaround based on undocumented features. > I attached the xsd-s referenced above and an xml containing a long string > value. The problem can be checked using the SourceValidator from Xerces2-J > samples: > The original xsd with slow validation: > {code:java} > java jaxp.SourceValidator -a unsafe.xsd -i long_string.xml > {code} > The workaround xsd with normal run-time: > {code:java} > java jaxp.SourceValidator -a workaround.xsd -i long_string.xml > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org