mbeckerle commented on code in PR #908:
URL: https://github.com/apache/daffodil/pull/908#discussion_r1067292157
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/SAXInfosetInputter.scala:
##########
@@ -21,79 +21,70 @@ import java.net.URI
import java.net.URISyntaxException
import org.apache.daffodil.api.DFDL
-import org.apache.daffodil.api.DFDL.DaffodilUnhandledSAXException
-import org.apache.daffodil.api.DFDL.DaffodilUnparseErrorSAXException
-import org.apache.daffodil.api.DFDL.SAXInfosetEvent
import org.apache.daffodil.dpath.NodeInfo
import org.apache.daffodil.exceptions.Assert
import org.apache.daffodil.infoset.InfosetInputterEventType.EndDocument
import org.apache.daffodil.infoset.InfosetInputterEventType.StartElement
+import org.apache.daffodil.processors.DaffodilUnparseContentHandler
+import org.apache.daffodil.util.Coroutine
+import org.apache.daffodil.util.Maybe
import org.apache.daffodil.util.Maybe.One
+import org.apache.daffodil.util.Maybe.Nope
import org.apache.daffodil.util.MaybeBoolean
import org.apache.daffodil.util.Misc
import org.apache.daffodil.xml.XMLUtils
/**
- * The SAXInfosetInputter consumes SAXInfosetEvent objects from the
DaffodilUnparseContentHandler
- * class and converts them to events that the DataProcessor unparse can use.
This class contains an
- * array of batched SAXInfosetEvent objects that it receives from the
contentHandler and the index
- * of the current element being processed.
+ * The SAXInfosetInputter consumes SAXInfosetEvent objects from the
+ * DaffodilUnparseContentHandler class and converts them to events that the
+ * DataProcessor unparse can use.
+ *
+ * See the DaffodilUnparseContentHandler for a detailed description of how
these two
+ * classes interact.
*
- * This class, together with the SAXInfosetInputter, uses coroutines to ensure
that a batch of events
- * (based on the tunable saxUnparseEventBatchSize) can be passed from the
former to the latter.
- * The following is the general process:
- *
- * - the run method is called, with the first batch of events, starting with
the StartDocument event,
- * already loaded on the inputter's queue.
- * This is collected and stored in the batchedInfosetEvents member, and the
currentIndex is set to 0
- * - The dp.unparse method is called, and it calls hasNext to make sure an
event exists to be
- * processed and then queries the event at currentIndex. The hasNext call also
checks that there is
- * a next event to be processed (currentIndex+1), and if not, queues the next
batch of events by
- * transferring control to the contentHandler so it can load them.
- * - After it is done with the current event, it calls inputter.next to get
the next event, and that
- * increments the currentIndex and cleans out the event at the previous index
- * - This process continues until the event at currentIndex either contains an
EndDocument event or
- * the currentIndex is the last in the batch. If it is the former, the
endDocumentReceived flag is
- * set to true and hasNext will return false. If it is the latter, the next
batch of events will be
- * queued by transferring control to the contentHandler so it can load them.
- * - This ends the unparse process, and the unparseResult and/or any Errors
are set on a single element
- * array containing response events. We call resumeFinal passing along that
array, terminating this
- * thread and resuming the contentHandler for the last time.
- *
- * @param unparseContentHandler producer coroutine that sends the
SAXInfosetEvent to this class
- * @param dp dataprocessor that we use to kickstart the unparse process and
that consumes the
- * currentEvent
- * @param output outputChannel of choice where the unparsed data is stored
+ * @param unparseContentHandler producer coroutine that sends the
SAXInfosetEvents to this class
+ * @param dp DataProcessor that we use to kickstart the unparse process to
consumes the events
+ * @param output OutputChannel where the unparsed data is written
+ * @param resolveRelaiveInfosetBlobURIs if true, elements with type xs:anyURI
type and a
+ * relativeURI are resolved relative to the classpath. This should only be
true when used
+ * via a TDMLRunner or similar testing infrastructure
*/
class SAXInfosetInputter(
- unparseContentHandler: DFDL.DaffodilUnparseContentHandler,
+ unparseContentHandler: DaffodilUnparseContentHandler,
dp: DFDL.DataProcessor,
- output: DFDL.Output)
- extends InfosetInputter with DFDL.ConsumerCoroutine {
+ output: DFDL.Output,
+ resolveRelativeInfosetBlobURIs: Boolean)
+ extends InfosetInputter with Coroutine[Array[SAXInfosetEvent]] {
/**
- * allows support for converting relative URIs in Blobs to absolute URIs,
this is necessary
- * for running TDML tests as they allow relative URIs. Because Daffodil
proper only uses
- * absolute URIs, we hide this functionality behind this flag. It can be set
to true by calling
- * the
unparseContentHandler.enableInputterResolutionOfRelativeInfosetBlobURIs(),
which calls the
- * inputter's enableResolutionOfRelativeInfosetBlobURIs() function to set
the below variable to true
+ * The index into batchedInfosetEvents that the InfosetInputter is currently
returning
+ * information about to the unparse()
*/
- private var resolveRelativeInfosetBlobURIs: Boolean = false
-
- private var endDocumentReceived = false
private var currentIndex: Int = 0
+
+ /**
+ * The array of batched infoset events received from the ContentHandler.
+ */
Review Comment:
This idea: if the entire set of events fits in the first batch, bypass the
whole thread/coroutines system, seems like it would be of some real value.
This is a large percentage of our use cases.
But the overhead of creating and starting one thread, once, because now
they're pooled, seems like it should be pretty minimal.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]