[ https://issues.apache.org/jira/browse/XERCESJ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mukul Gandhi updated XERCESJ-1716: ---------------------------------- Issue Type: Improvement (was: Bug) I'm changing the issue type of this report, specifying it as a possible performance improvement requirement. > Validating XML against XSD is slow for long strings if pattern restrictions > are defined, even if maxLength is restricted. > ------------------------------------------------------------------------------------------------------------------------- > > Key: XERCESJ-1716 > URL: https://issues.apache.org/jira/browse/XERCESJ-1716 > Project: Xerces2-J > Issue Type: Improvement > Reporter: Márk Petrényi > Priority: Major > Attachments: long_string.xml, unsafe.xsd, workaround.xsd > > > Validating XML against XSD is slow for long strings if pattern restrictions > are defined, even if maxLength is restricted. > We have the following simple type defined in our xsd (unsafe.xsd): > {code:xml} > <xsd:simpleType name="SimpleText255NotBlankType"> > <xsd:annotation> > <xsd:documentation xml:lang="en">String of maximum 255 characters, not > blank</xsd:documentation> > </xsd:annotation> > <xsd:restriction base="xsd:string"> > <xsd:minLength value="1"/> > <xsd:maxLength value="255"/> > <xsd:pattern value=".*[^\s].*"/> > </xsd:restriction> > </xsd:simpleType> > {code} > The problem is when a really long string (ca. 1000000 characters) is provided > as a value in the input xml, we would assume that it is regarded invalid > quickly because of the length. Actually the validation takes several minutes > since the regex gets evaluated before the maxLength restriction. > We found a workaround for the issue if we define the simpleType this way > (workaround.xsd): > {code:xml} > <xsd:simpleType name="SimpleText255Type"> > <xsd:annotation> > <xsd:documentation xml:lang="en">String of maximum 255 > characters</xsd:documentation> > </xsd:annotation> > <xsd:restriction base="xsd:string"> > <xsd:minLength value="1"/> > <xsd:maxLength value="255"/> > <xsd:pattern value=".\{1,255}"/> > </xsd:restriction> > </xsd:simpleType> > <xsd:simpleType name="SimpleText255NotBlankType"> > <xsd:annotation> > <xsd:documentation xml:lang="en">String of maximum 255 characters, not > blank</xsd:documentation> > </xsd:annotation> > <xsd:restriction base="SimpleText255Type"> > <xsd:pattern value=".*[^\s].*"/> > </xsd:restriction> > </xsd:simpleType> > {code} > The workaround only works because the implementation of the XSSimpleType > builds a Vector of the regex patterns and the {{.{1,255}}} pattern will be > evaluated first and it fails relatively quickly thus the time consuming > second regex wont be checked. > It would be great to have the regex pattern checked after validating other > xsd restrictions (minLength, maxLength, etc..) or to have control over the > validation ordering, thus avoiding unneccesseraly slow validations and the > use of a workaround based on undocumented features. > I attached the xsd-s referenced above and an xml containing a long string > value. The problem can be checked using the SourceValidator from Xerces2-J > samples: > The original xsd with slow validation: > {code:java} > java jaxp.SourceValidator -a unsafe.xsd -i long_string.xml > {code} > The workaround xsd with normal run-time: > {code:java} > java jaxp.SourceValidator -a workaround.xsd -i long_string.xml > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org