mbeckerle commented on a change in pull request #431:
URL: https://github.com/apache/incubator-daffodil/pull/431#discussion_r512107443
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala
##########
@@ -691,41 +702,38 @@ class DataProcessor private (
class ParseResult(dp: DataProcessor, override val resultState: PState)
extends DFDL.ParseResult
- with WithDiagnosticsImpl
- with ErrorHandler {
+ with WithDiagnosticsImpl {
/**
- * To be successful here, we need to capture xerces parse/validation
+ * To be successful here, we need to capture parse/validation
* errors and add them to the Diagnostics list in the PState.
*
- * @param state the initial parse state.
+ * @param bytes the parsed Infoset
*/
def validateResult(bytes: Array[Byte]): Unit = {
Assert.usage(resultState.processorStatus eq Success)
- val schemaURIStrings =
resultState.infoset.asInstanceOf[InfosetElement].runtimeData.schemaURIStringsForFullValidation
- try {
- val bis = new java.io.ByteArrayInputStream(bytes)
- Validator.validateXMLSources(schemaURIStrings, bis, this)
- } catch {
- //
- // Some SAX Parse errors are thrown even if you specify an error handler
to the
- // validator.
- //
- // So we also need this catch
- //
- case e: SAXException =>
- resultState.validationErrorNoContext(e)
+
+ val v = dp.validationMode match {
Review comment:
If you hoist this val v into DataProcessor lazy val validator = ....
then it will be computed once when first used, but reused by that data
processor over and over if many parses are called which request validation.
You'd then pass that down to the actual validation.
I'm not sure for Xerces if this will matter much, because we may have to
construct a whole Xerces instance and initialize it, and read in the schema,
for every thread. Maybe not. Depends on whether a Xerces validator once
initialized can be called from multiple threads, and just returns a result.
Still even if a whole separate xerces object is required per thread, then
this could happen once per thread per DataProcessor, so it might be helpful if
threads are long-living and used to process the same kind of data over and over
- which would be typical of message-processing.
But for Schematron, this could be a big win because the whole conversion of
the schematron file into xslt, and construction of the transformer from the
XSLT might be able to be done exactly once and reused. I am not sure about the
transformer object being stateless or not. If not then that would have to be a
ThreadLocal object, but at least the generation of the XSLT would be fully
shared/once-only.
v.validateXML will then be responsible for being thread-safe since parse can
be called on many threads simultaneously, but that means v.validateXML will
have to use ThreadLocal objects, as expected, for anything stateful.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]