[jira] [Commented] (XERCESJ-1754) XMLSchemaValidator reset no longer resets id validation caches
[ https://issues.apache.org/jira/browse/XERCESJ-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705164#comment-17705164 ] Radu Coravu commented on XERCESJ-1754: -- Thanks [~mukul_gandhi] > XMLSchemaValidator reset no longer resets id validation caches > -- > > Key: XERCESJ-1754 > URL: https://issues.apache.org/jira/browse/XERCESJ-1754 > Project: Xerces2-J > Issue Type: Bug > Components: XML Schema 1.1 Structures >Reporter: Radu Coravu >Assignee: Mukul Gandhi >Priority: Major > Attachments: test.xml, test.xsd > > > I validate an XML with an XML Schema 1.1 file. > On the first validation the XML is reported valid. > On the second validation I re-use the parser, the ID values inside elements > are reported as duplicate and I get errors like this reported: > {code}Message: cvc-type.3.1.3: The value 'thing122' of element 'uid' is not > valid.{code} > Looking at the method > org.apache.xerces.impl.xs.XMLSchemaValidator.reset(XMLComponentManager), > there is a fast return inside it: > {code}if (!parser_settings) { > // parser settings have not been changed > fValidationManager.addValidationState(fValidationState); > // the node limit on the SecurityManager may have changed so need > to refresh. > nodeFactory.reset(); > // Re-parse external schema location properties. > XMLSchemaLoader.processExternalHints( > fExternalSchemas, > fExternalNoNamespaceSchema, > fLocationPairs, > fXSIErrorReporter.fErrorReporter); > return; > }{code} > and this means all the code which for example cleared the IDs cache: > {code}// reset ID Context > if (fIDContext != null) { > fIDContext.clear(); > }{code} > is no longer executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESJ-1754) XMLSchemaValidator reset no longer resets id validation caches
[ https://issues.apache.org/jira/browse/XERCESJ-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677606#comment-17677606 ] Radu Coravu commented on XERCESJ-1754: -- Hi Mukul, I agree with your analysis. I think one reason we reuse the parser is for parsing the XML schema references only once as they will remain as information inside the parser. We could have probably used a grammar pool instead. We have a patched version of Xerces on our side and I commented out this entire code in the XMLSchemaValidator {code}//boolean parser_settings; //try { //parser_settings = componentManager.getFeature(PARSER_SETTINGS); //} //catch (XMLConfigurationException e){ //parser_settings = true; //} // //if (!parser_settings) { //// parser settings have not been changed //fValidationManager.addValidationState(fValidationState); //// the node limit on the SecurityManager may have changed so need to refresh. //nodeFactory.reset(); //// Re-parse external schema location properties. //XMLSchemaLoader.processExternalHints( //fExternalSchemas, //fExternalNoNamespaceSchema, //fLocationPairs, //fXSIErrorReporter.fErrorReporter); //return; //}{code} > XMLSchemaValidator reset no longer resets id validation caches > -- > > Key: XERCESJ-1754 > URL: https://issues.apache.org/jira/browse/XERCESJ-1754 > Project: Xerces2-J > Issue Type: Bug > Components: XML Schema 1.1 Structures >Reporter: Radu Coravu >Priority: Major > Attachments: test.xml, test.xsd > > > I validate an XML with an XML Schema 1.1 file. > On the first validation the XML is reported valid. > On the second validation I re-use the parser, the ID values inside elements > are reported as duplicate and I get errors like this reported: > {code}Message: cvc-type.3.1.3: The value 'thing122' of element 'uid' is not > valid.{code} > Looking at the method > org.apache.xerces.impl.xs.XMLSchemaValidator.reset(XMLComponentManager), > there is a fast return inside it: > {code}if (!parser_settings) { > // parser settings have not been changed > fValidationManager.addValidationState(fValidationState); > // the node limit on the SecurityManager may have changed so need > to refresh. > nodeFactory.reset(); > // Re-parse external schema location properties. > XMLSchemaLoader.processExternalHints( > fExternalSchemas, > fExternalNoNamespaceSchema, > fLocationPairs, > fXSIErrorReporter.fErrorReporter); > return; > }{code} > and this means all the code which for example cleared the IDs cache: > {code}// reset ID Context > if (fIDContext != null) { > fIDContext.clear(); > }{code} > is no longer executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESJ-1754) XMLSchemaValidator reset no longer resets id validation caches
[ https://issues.apache.org/jira/browse/XERCESJ-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677474#comment-17677474 ] Mukul Gandhi commented on XERCESJ-1754: --- Within your original XML schema document, attached within this bug report (test.xsd), you specify following on the top most xs:schema element, http://www.w3.org/2007/XMLSchema-versioning; vc:minVersion="1.1" ... I believe, this alone is not sufficient with XercesJ to invoke an XML Schema 1.1 validation. To invoke XML Schema 1.1 validation with XercesJ, it needs to be done via an API provided by XercesJ. For example as follows, private static String saxParserFactoryClass = "org.apache.xerces.jaxp.SAXParserFactoryImpl"; SchemaFactory xsdSchemaFactory = SchemaFactory.newInstance(Constants.W3C_XML_SCHEMA11_NS_URI); Schema xsdSchema = xsdSchemaFactory.newSchema(new StreamSource(xsdDocument)); SAXParserFactory saxParserFactory = SAXParserFactory.newInstance(saxParserFactoryClass, null); saxParserFactory.setNamespaceAware(true); saxParserFactory.setSchema(xsdSchema); SAXParser saxParser = saxParserFactory.newSAXParser(); saxParser.parse(xmlDocument, new XmlParseErrorHandler()); I believe, that with XercesJ, we cannot select XML Schema 1.1 validation when parsing and validating directly with XMLReaderFactory and XMLReader (I think, only XML Schema 1.0 validation is possible via this method). Using XMLReader is essentially using XML document event handling like an XML SAX parser. Actually, a proper Java SAXParser (like, org.apache.xerces.parsers.SAXParser) has an underlying XMLReader that does SAX like XML document event handling. Mainly, I wish to suggest that, you should use the real XercesJ SAX parser (i.e, org.apache.xerces.parsers.SAXParser) to select XML Schema 1.1 validation via XercesJ (using something, like the code pattern I've suggested above). About your actual bug report within this thread, I've verified, and I agree that it seems to be a bug with XercesJ's XML Schema 1.1 validator when using with XercesJ's SAX parser. i.e, when using two consecutive java statements like following, saxParser.parse(xmlDocument, new XmlParseErrorHandler()); saxParser.parse(xmlDocument, new XmlParseErrorHandler()); the second statement, produces XML schema validation failure (but the first one doesn't). As a workaround, you might do like following (i.e, you can create a new XML SAX parser instance for the 2nd and all subsequent parse(..) calls), saxParser.parse(xmlDocument, new XmlParseErrorHandler()); saxParser = saxParserFactory.newSAXParser(); saxParser.parse(xmlDocument, new XmlParseErrorHandler()); The above workaround, may not be very elegant, but I don't see much performance overhead with above mentioned workaround. And I could also verify that, not having 'return;' statement within XercesJ XMLSchemaValidator class's code that you've cited, seems to solve this bug for the XML Schema 1.1 validation. I could also verify that, XercesJ's XML Schema 1.0 validator (that's available within XercesJ's XML Schema 1.1 distribution), is not affected by this bug. Perhaps, someone could try fixing the issues, mentioned within this bug report, in the right way. > XMLSchemaValidator reset no longer resets id validation caches > -- > > Key: XERCESJ-1754 > URL: https://issues.apache.org/jira/browse/XERCESJ-1754 > Project: Xerces2-J > Issue Type: Bug > Components: XML Schema 1.1 Structures >Reporter: Radu Coravu >Priority: Major > Attachments: test.xml, test.xsd > > > I validate an XML with an XML Schema 1.1 file. > On the first validation the XML is reported valid. > On the second validation I re-use the parser, the ID values inside elements > are reported as duplicate and I get errors like this reported: > {code}Message: cvc-type.3.1.3: The value 'thing122' of element 'uid' is not > valid.{code} > Looking at the method > org.apache.xerces.impl.xs.XMLSchemaValidator.reset(XMLComponentManager), > there is a fast return inside it: > {code}if (!parser_settings) { > // parser settings have not been changed > fValidationManager.addValidationState(fValidationState); > // the node limit on the SecurityManager may have changed so need > to refresh. > nodeFactory.reset(); > // Re-parse external schema location properties. > XMLSchemaLoader.processExternalHints( > fExternalSchemas, > fExternalNoNamespaceSchema, > fLocationPairs, > fXSIErrorReporter.fErrorReporter); > return; > }{code} > and this means all the code which for example cleared the IDs cache: > {code}// reset ID Context > if (fIDContext != null) { >
[jira] [Commented] (XERCESJ-1754) XMLSchemaValidator reset no longer resets id validation caches
[ https://issues.apache.org/jira/browse/XERCESJ-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677160#comment-17677160 ] Radu Coravu commented on XERCESJ-1754: -- As I said in my initial report I validate an XML with an XML Schema 1.1 file. So yes, I'm using the XML Schema 1.1 validator. > XMLSchemaValidator reset no longer resets id validation caches > -- > > Key: XERCESJ-1754 > URL: https://issues.apache.org/jira/browse/XERCESJ-1754 > Project: Xerces2-J > Issue Type: Bug > Components: XML Schema 1.1 Structures >Reporter: Radu Coravu >Priority: Major > Attachments: test.xml, test.xsd > > > I validate an XML with an XML Schema 1.1 file. > On the first validation the XML is reported valid. > On the second validation I re-use the parser, the ID values inside elements > are reported as duplicate and I get errors like this reported: > {code}Message: cvc-type.3.1.3: The value 'thing122' of element 'uid' is not > valid.{code} > Looking at the method > org.apache.xerces.impl.xs.XMLSchemaValidator.reset(XMLComponentManager), > there is a fast return inside it: > {code}if (!parser_settings) { > // parser settings have not been changed > fValidationManager.addValidationState(fValidationState); > // the node limit on the SecurityManager may have changed so need > to refresh. > nodeFactory.reset(); > // Re-parse external schema location properties. > XMLSchemaLoader.processExternalHints( > fExternalSchemas, > fExternalNoNamespaceSchema, > fLocationPairs, > fXSIErrorReporter.fErrorReporter); > return; > }{code} > and this means all the code which for example cleared the IDs cache: > {code}// reset ID Context > if (fIDContext != null) { > fIDContext.clear(); > }{code} > is no longer executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESJ-1754) XMLSchemaValidator reset no longer resets id validation caches
[ https://issues.apache.org/jira/browse/XERCESJ-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676910#comment-17676910 ] Mukul Gandhi commented on XERCESJ-1754: --- I've written a Java program, with following outline, to debug this issue, String xmlParserClass = "org.apache.xerces.parsers.SAXParser"; String schemaFeature = "http://apache.org/xml/features/validation/schema;; String xmlDocument = "test.xml"; XMLReader xmlReader = XMLReaderFactory.createXMLReader(xmlParserClass); xmlReader.setFeature(schemaFeature, true); xmlReader.setErrorHandler(new XmlParseErrorHandler()); xmlReader.parse(xmlDocument);// parse and validate(a) xmlReader.parse(xmlDocument);// parse and validate, again(b) I've used your, attached XML instance and XML Schema documents. But I've run the XML Schema validation with 1.0 mode. i.e, I've modified the XML instance documents that you've have shared, as following, http://example.org/; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://example.org/ test.xsd"> As we can see that, on the Java commented lines above [(a) and (b)], we do XML Schema validation twice (with XML instance and XML Schema documents remaining same) reusing the same XMLReader object instance. And on both occasions (a) and (b), I don't get any XML Schema validation errors. Out of curiosity, if I add little delay between xmlReader.parse ... calls like below, I still don't get any XML Schema validation errors. Thread.sleep(1000 * 5); Do you think, this issue is with XercesJ's XML Schema 1.1 validator, and not with 1.0 validator? > XMLSchemaValidator reset no longer resets id validation caches > -- > > Key: XERCESJ-1754 > URL: https://issues.apache.org/jira/browse/XERCESJ-1754 > Project: Xerces2-J > Issue Type: Bug > Components: XML Schema 1.1 Structures >Reporter: Radu Coravu >Priority: Major > Attachments: test.xml, test.xsd > > > I validate an XML with an XML Schema 1.1 file. > On the first validation the XML is reported valid. > On the second validation I re-use the parser, the ID values inside elements > are reported as duplicate and I get errors like this reported: > {code}Message: cvc-type.3.1.3: The value 'thing122' of element 'uid' is not > valid.{code} > Looking at the method > org.apache.xerces.impl.xs.XMLSchemaValidator.reset(XMLComponentManager), > there is a fast return inside it: > {code}if (!parser_settings) { > // parser settings have not been changed > fValidationManager.addValidationState(fValidationState); > // the node limit on the SecurityManager may have changed so need > to refresh. > nodeFactory.reset(); > // Re-parse external schema location properties. > XMLSchemaLoader.processExternalHints( > fExternalSchemas, > fExternalNoNamespaceSchema, > fLocationPairs, > fXSIErrorReporter.fErrorReporter); > return; > }{code} > and this means all the code which for example cleared the IDs cache: > {code}// reset ID Context > if (fIDContext != null) { > fIDContext.clear(); > }{code} > is no longer executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESJ-1754) XMLSchemaValidator reset no longer resets id validation caches
[ https://issues.apache.org/jira/browse/XERCESJ-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676651#comment-17676651 ] Radu Coravu commented on XERCESJ-1754: -- We are an XML editing tool so while the end user is editing the XML document, we validate each time the XML changes. So in our case, the XML Schema remains the same but we use the same "org.xml.sax.XMLReader" instance (the same object) created with full schema validation features to re-parse the XML document each time it is changed. > XMLSchemaValidator reset no longer resets id validation caches > -- > > Key: XERCESJ-1754 > URL: https://issues.apache.org/jira/browse/XERCESJ-1754 > Project: Xerces2-J > Issue Type: Bug > Components: XML Schema 1.1 Structures >Reporter: Radu Coravu >Priority: Major > Attachments: test.xml, test.xsd > > > I validate an XML with an XML Schema 1.1 file. > On the first validation the XML is reported valid. > On the second validation I re-use the parser, the ID values inside elements > are reported as duplicate and I get errors like this reported: > {code}Message: cvc-type.3.1.3: The value 'thing122' of element 'uid' is not > valid.{code} > Looking at the method > org.apache.xerces.impl.xs.XMLSchemaValidator.reset(XMLComponentManager), > there is a fast return inside it: > {code}if (!parser_settings) { > // parser settings have not been changed > fValidationManager.addValidationState(fValidationState); > // the node limit on the SecurityManager may have changed so need > to refresh. > nodeFactory.reset(); > // Re-parse external schema location properties. > XMLSchemaLoader.processExternalHints( > fExternalSchemas, > fExternalNoNamespaceSchema, > fLocationPairs, > fXSIErrorReporter.fErrorReporter); > return; > }{code} > and this means all the code which for example cleared the IDs cache: > {code}// reset ID Context > if (fIDContext != null) { > fIDContext.clear(); > }{code} > is no longer executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESJ-1754) XMLSchemaValidator reset no longer resets id validation caches
[ https://issues.apache.org/jira/browse/XERCESJ-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676639#comment-17676639 ] Mukul Gandhi commented on XERCESJ-1754: --- you wrote, "We perform the same validation multiple times re-using the same parser". Within your program, on different repetitions of XML schema validation, do you use same XML schema document and XML instance document every time? Lets say, we've following outline (do you use the same kind of code pattern?) of XML schema validation Java program, DocumentBuilderFactory parserFactory = DocumentBuilderFactory.newInstance(); parserFactory.setNamespaceAware(true); DocumentBuilder parser = parserFactory.newDocumentBuilder(); Document document = parser.parse(new File("test.xml")); SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI); Source schemaFile = new StreamSource(new File("test.xsd")); Schema schema = factory.newSchema(schemaFile); Validator validator = schema.newValidator(); validator.validate(new DOMSource(document)); When you say that you, "re-using the same parser", do you reuse for example the Java object 'DocumentBuilder parser'? Are you using an XML dom parser within your program, or a different one? Are you reusing any of the following Java objects as well : 'Schema schema', 'Validator validator'? Are you using JAXP api as cited within this message? Perhaps, after knowing answers of above questions, someone may attempt to solve the issues mentioned within this bug report. > XMLSchemaValidator reset no longer resets id validation caches > -- > > Key: XERCESJ-1754 > URL: https://issues.apache.org/jira/browse/XERCESJ-1754 > Project: Xerces2-J > Issue Type: Bug > Components: XML Schema 1.1 Structures >Reporter: Radu Coravu >Priority: Major > Attachments: test.xml, test.xsd > > > I validate an XML with an XML Schema 1.1 file. > On the first validation the XML is reported valid. > On the second validation I re-use the parser, the ID values inside elements > are reported as duplicate and I get errors like this reported: > {code}Message: cvc-type.3.1.3: The value 'thing122' of element 'uid' is not > valid.{code} > Looking at the method > org.apache.xerces.impl.xs.XMLSchemaValidator.reset(XMLComponentManager), > there is a fast return inside it: > {code}if (!parser_settings) { > // parser settings have not been changed > fValidationManager.addValidationState(fValidationState); > // the node limit on the SecurityManager may have changed so need > to refresh. > nodeFactory.reset(); > // Re-parse external schema location properties. > XMLSchemaLoader.processExternalHints( > fExternalSchemas, > fExternalNoNamespaceSchema, > fLocationPairs, > fXSIErrorReporter.fErrorReporter); > return; > }{code} > and this means all the code which for example cleared the IDs cache: > {code}// reset ID Context > if (fIDContext != null) { > fIDContext.clear(); > }{code} > is no longer executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESJ-1754) XMLSchemaValidator reset no longer resets id validation caches
[ https://issues.apache.org/jira/browse/XERCESJ-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676484#comment-17676484 ] Radu Coravu commented on XERCESJ-1754: -- I'm afraid our code which does the validation is too complex to easily simplify. We perform the same validation multiple times re-using the same parser. The XMLSchemaValidator gets reset and reused but as I said because the reset method has a fast return, not all caches are reset. > XMLSchemaValidator reset no longer resets id validation caches > -- > > Key: XERCESJ-1754 > URL: https://issues.apache.org/jira/browse/XERCESJ-1754 > Project: Xerces2-J > Issue Type: Bug > Components: XML Schema 1.1 Structures >Reporter: Radu Coravu >Priority: Major > Attachments: test.xml, test.xsd > > > I validate an XML with an XML Schema 1.1 file. > On the first validation the XML is reported valid. > On the second validation I re-use the parser, the ID values inside elements > are reported as duplicate and I get errors like this reported: > {code}Message: cvc-type.3.1.3: The value 'thing122' of element 'uid' is not > valid.{code} > Looking at the method > org.apache.xerces.impl.xs.XMLSchemaValidator.reset(XMLComponentManager), > there is a fast return inside it: > {code}if (!parser_settings) { > // parser settings have not been changed > fValidationManager.addValidationState(fValidationState); > // the node limit on the SecurityManager may have changed so need > to refresh. > nodeFactory.reset(); > // Re-parse external schema location properties. > XMLSchemaLoader.processExternalHints( > fExternalSchemas, > fExternalNoNamespaceSchema, > fLocationPairs, > fXSIErrorReporter.fErrorReporter); > return; > }{code} > and this means all the code which for example cleared the IDs cache: > {code}// reset ID Context > if (fIDContext != null) { > fIDContext.clear(); > }{code} > is no longer executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESJ-1754) XMLSchemaValidator reset no longer resets id validation caches
[ https://issues.apache.org/jira/browse/XERCESJ-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676249#comment-17676249 ] Mukul Gandhi commented on XERCESJ-1754: --- I guess, you might be having a Java program, where you're doing this XML Schema validation. If you may share that, then that should be helpful to debug this issue. > XMLSchemaValidator reset no longer resets id validation caches > -- > > Key: XERCESJ-1754 > URL: https://issues.apache.org/jira/browse/XERCESJ-1754 > Project: Xerces2-J > Issue Type: Bug > Components: XML Schema 1.1 Structures >Reporter: Radu Coravu >Priority: Major > Attachments: test.xml, test.xsd > > > I validate an XML with an XML Schema 1.1 file. > On the first validation the XML is reported valid. > On the second validation I re-use the parser, the ID values inside elements > are reported as duplicate and I get errors like this reported: > {code}Message: cvc-type.3.1.3: The value 'thing122' of element 'uid' is not > valid.{code} > Looking at the method > org.apache.xerces.impl.xs.XMLSchemaValidator.reset(XMLComponentManager), > there is a fast return inside it: > {code}if (!parser_settings) { > // parser settings have not been changed > fValidationManager.addValidationState(fValidationState); > // the node limit on the SecurityManager may have changed so need > to refresh. > nodeFactory.reset(); > // Re-parse external schema location properties. > XMLSchemaLoader.processExternalHints( > fExternalSchemas, > fExternalNoNamespaceSchema, > fLocationPairs, > fXSIErrorReporter.fErrorReporter); > return; > }{code} > and this means all the code which for example cleared the IDs cache: > {code}// reset ID Context > if (fIDContext != null) { > fIDContext.clear(); > }{code} > is no longer executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: j-dev-h...@xerces.apache.org