[ https://issues.apache.org/jira/browse/XERCESJ-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465116#comment-17465116 ]
J Morris commented on XERCESJ-1726: ----------------------------------- {{Hi, This is all a long time ago now. That is not to say that I have lost interest though. My first observation is that the assert that you quoted in your comment does not appear to correspond to the one in the *test1.xsd* file in the *testX.zip* attached to the bug report. The one that you quoted was: <xs:assert test="./text()[matches(.,'^([a-z] {2}[a-z]?-[A-Z]{2} \.((((com)|(lib)|(mod([a-z][a-z0-9])?)|(plg[a-z][a-z0-9](-[a-z0-9][a-z0-9])*)|(tpl))_[a-z][a-z0-9](\.sys)?(\.ini))|(ini)|(css)|(localise\.php)))|([a-z] {2}[a-z]?-[A-Z]{2} \.xml)|(install\.xml)$')]"/> whereas my one in the *test1.xsd* file was: <xs:assert test="./text()[matches(.,'^([a-z]{2}[a-z]?-[A-Z]{2}\.((((com)|(lib)|(mod(_[a-z][a-z0-9]+)?)|(plg_[a-z][a-z0-9]+(\-[a-z0-9][a-z0-9]+)*)|(tpl))_[a-z][a-z0-9]+(\.sys)?(\.ini))|(ini)|(css)|(localise\.php)))|([a-z]{2}[a-z]?-[A-Z]{2}\.xml)|(install\.xml)$')]"/> You will notice differences in that some of the key repeat counts in the regex are missing in your version, as well as some of the underscores ("_"). The intention of my version was to match lines with *text()* in any of the following formats: 1) ^[a-z]{2}[a-z]?-[A-Z]{2}\.com_[a-z][a-z0-9]+(\.sys)?\.ini$ 2) ^[a-z]{2}[a-z]?-[A-Z]{2}\.lib_[a-z][a-z0-9]+(\.sys)?\.ini$ 3) ^[a-z]{2}[a-z]?-[A-Z]{2}\.mod(_[a-z][a-z0-9]+)?_[a-z][a-z0-9]+(\.sys)?\.ini$ 4) ^[a-z]{2}[a-z]?-[A-Z]{2}\.plg_[a-z][a-z0-9]+(\-[a-z0-9][a-z0-9]+)*_[a-z][a-z0-9]+(\.sys)?\.ini$ 5) ^[a-z]{2}[a-z]?-[A-Z]{2}\.tpl_[a-z][a-z0-9]+(\.sys)?\.ini$ 6) ^[a-z]{2}[a-z]?-[A-Z]{2}\.css$ 7) ^[a-z]{2}[a-z]?-[A-Z]{2}\.ini$ 8) ^[a-z]{2}[a-z]?-[A-Z]{2}\.localise\.php$ 9) ^[a-z]{2}[a-z]?-[A-Z]{2}\.xml$ 10) ^install\.xml$ Note: The pattern *[a-z]{2}[a-z]?-[A-Z]{2}* is a language prefix (of at least two lowercase alphabetic characters followed by a minus followed by exactly two uppercase alphabetic characters) in the same spirit as locale settings in operating systems. This set of 10 patterns was supposed to match the *text()* entries for ALL of lines in the *test1.xml* file so it was a surprise to me to get any errors reported. Thank you for your continuing interest.}} > Possible Bug: Xerces 2.12.1 for XML Validation with XSD 1.1 Schema under Java > ----------------------------------------------------------------------------- > > Key: XERCESJ-1726 > URL: https://issues.apache.org/jira/browse/XERCESJ-1726 > Project: Xerces2-J > Issue Type: Bug > Components: Samples > Affects Versions: 2.12.1 > Environment: Windows 7 > Java 1.8.0_261 > Xerces-J 2.12.1 > Reporter: J Morris > Priority: Major > Labels: test > Attachments: testX.zip, test_cases_ mukul.zip > > Original Estimate: 72h > Remaining Estimate: 72h > > I have recently been trying to validate the XML file *test1.xml* with a > schema *test.xsd* containing *assert*/*assertion* constructs, using the > sample program *jaxp.SourceValidator*. > Unexpectedly, the result was several reported errors in what appeared to be > syntactically correct and valid XML lines (*test1.xml*: 9 errors). > After significant experimentation, it appeared that these errors were > occurring at line numbers which the validation found troublesome. Inserting > an extra line at one of the troublesome line numbers made the previously > erroneous line (now *not* appearing at a troublesome line number) pass > validation. On the other hand, the newly inserted line (occupying the > troublesome line number) would fail validation. > I tentatively interpreted this as meaning that *the validation errors were > not real* and began to try to develop a test-case, as similar as possible to > *test1.xml*, but which passed validation. The result was *test2.xml*, which > was generated from *test1.xml* by inserting XML comment lines at each of the > troublesome line numbers, thereby displacing the previously erroneous lines > to non-trooublesome line numbers. Since XML comment lines do not require > validation, this file passed validation for me (*test2.xml*: 0 errors). > I then contacted Mukul Gandhi and he re-ran my validations *but came to a > different result*. He saw errors in both XML files (*test1.xml*: 9 errors; > *test2.xml*: 18 errors). Despite our joint efforts to achieve convergence > between our respective validation runs, we have not so far succeeded. > Mukul did point out a couple of things: > 1) The way that I was using the "matches" function in the *assert* > constructs. His experience suggested that this was unreliable. However, I was > not certain whether this would have led to the type of behaviour that I was > seeing (apparent troublesome line numbers). > 2) He found that certain characters (probably the two accented French > characters) in my XML files were not supported in the default XML encoding > scheme, UTF-8. However, for me, no errors were reported for those by the > validation program *jaxp.SourceValidator*. > I would be very gratefull foe some help in getting to the bottom of this > (both the original behaviour and the discrepancies with Mukul's validation > runs). -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org