[
https://issues.apache.org/jira/browse/XERCESJ-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475112#comment-13475112
]
Martin Honnen commented on XERCESJ-1592:
----------------------------------------
I had to dig deep to find the property so for completeness I mention what I
found:
http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/impl/dv/xs/TypeValidator.java?revision=1375610&view=markup&pathrev=1375610
which says "Introducing a system property for controlling how string length is
computed by the schema validator. When
org.apache.xerces.impl.dv.xs.useCodePointCountForStringLength=true, the length
of an xs:string or xs:anyURI value is calculated by counting the number of
Unicode code points in the string. The value of the system property is false by
default, preserving the long standing behaviour of computing length in Java
chars (i.e. String.length()).".
> schema validation incorrectly treating single character outside of BMP as two
> characters
> ----------------------------------------------------------------------------------------
>
> Key: XERCESJ-1592
> URL: https://issues.apache.org/jira/browse/XERCESJ-1592
> Project: Xerces2-J
> Issue Type: Bug
> Components: XML Schema 1.0 Datatypes
> Affects Versions: 2.11.0
> Environment: Windows 7, Oracle Java JRE 1.7
> Reporter: Martin Honnen
>
> When validating the instance document
> http://home.arcor.de/martin.honnen/xml/oneCharInstance1.xml against the
> schema http://home.arcor.de/martin.honnen/xml/oneCharSchema1.xsd Xerces
> reports the following validation error(s):
> "[Error] oneCharInstance1.xml:3:25: cvc-length-valid: Value '?' with length =
> '2'
> is not facet-valid with respect to length '1' for type 'one-char'.
> [Error] oneCharInstance1.xml:3:25: cvc-type.3.1.3: The value '?' of element
> 'test' is not valid."
> The "test" element however contains a single character
> (<test>𐌀</test>), albeit one which is not inside the BMP. In terms of
> the XML specification http://www.w3.org/TR/xml/#dt-character and the schema
> data type specification http://www.w3.org/TR/xmlschema-2/#string there is no
> difference between characters in the BMP and outside of it, each one counts
> as a single character.
> So the sample XML is valid against the sample schema and Xerces should not
> report any error.
> Other validating parsers like Saxon 9.4 EE and XSV
> (http://www.w3.org/2001/03/webdata/xsv?docAddrs=http%3A%2F%2Fhome.arcor.de%2Fmartin.honnen%2Fxml%2FoneCharInstance1.xml+http%3A%2F%2Fhome.arcor.de%2Fmartin.honnen%2Fxml%2FoneCharSchema1.xsd&warnings=on&keepGoing=on&style=xsl#)
> don't report any validation error for the samples named above.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]