tuxji commented on code in PR #908:
URL: https://github.com/apache/daffodil/pull/908#discussion_r1067529104
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/api/DFDLParserUnparser.scala:
##########
@@ -226,84 +221,22 @@ object DFDL {
}
trait DaffodilUnparseContentHandler
- extends ProducerCoroutine
- with org.xml.sax.ContentHandler {
+ extends org.xml.sax.ContentHandler {
def getUnparseResult: UnparseResult
- def enableInputterResolutionOfRelativeInfosetBlobURIs(): Unit
+ def enableResolutionOfRelativeInfosetBlobURIs(): Unit
}
/**
* Used in SAXInfosetEvent.causeError to identify when an
unparseResult.isError is true
*/
class DaffodilUnparseErrorSAXException(unparseResult: UnparseResult)
- extends
org.xml.sax.SAXException(unparseResult.getDiagnostics.mkString("\n"))
+ extends SAXException(unparseResult.getDiagnostics.mkString("\n"))
/**
* Used in SAXInfosetEvent.causeError to identify when an unexpected error
occurs
Review Comment:
You've moved SAXInfosetEvent from api/DFDLParserUnparser.scala to
infoset/SAXInfosetInputter.scala, but these scaladocs still mention
SAXInfosetEvent. Both DaffodilUnparseErrorSAXException and
DaffodilUnhandledSAXException just extend SAXException, so their own names are
the only extra info they communicate to users. It looks like Daffodil's apis
(runtime1's api, daffodil-japi, daffodil-sapi) want to capture SAXException and
rethrow it as either DaffodilUnparseErrorSAXException or
DaffodilUnhandledSAXException. I'm not sure why it's so important that
Daffodil users receive instances of these exceptions instead of generic
SAXException instances, so you might want to make the scaladocs explain why
these exceptions matter to users rather than mention which internal Daffodil
code uses these exceptions.
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilUnparseContentHandler.scala:
##########
@@ -28,156 +28,224 @@ import org.xml.sax.Locator
import org.apache.daffodil.api.DFDL
import org.apache.daffodil.api.DFDL.DaffodilUnhandledSAXException
import org.apache.daffodil.api.DFDL.DaffodilUnparseErrorSAXException
-import org.apache.daffodil.api.DFDL.SAXInfosetEvent
import org.apache.daffodil.exceptions.Assert
import org.apache.daffodil.infoset.InfosetInputterEventType.EndDocument
import org.apache.daffodil.infoset.InfosetInputterEventType.EndElement
import org.apache.daffodil.infoset.InfosetInputterEventType.StartDocument
import org.apache.daffodil.infoset.InfosetInputterEventType.StartElement
+import org.apache.daffodil.infoset.SAXInfosetEvent
import org.apache.daffodil.infoset.SAXInfosetInputter
import org.apache.daffodil.util.MStackOf
+import org.apache.daffodil.util.MainCoroutine
import org.apache.daffodil.util.Maybe
import org.apache.daffodil.util.Maybe.Nope
import org.apache.daffodil.util.Maybe.One
import org.apache.daffodil.util.Misc
/**
- * DaffodilUnparseContentHandler produces SAXInfosetEvent objects for the
SAXInfosetInputter to
- * consume and convert to events that the Dataprocessor unparse can use. The
SAXInfosetEvent object
- * is built from information that is passed to the ContentHandler from an
XMLReader parser. In
- * order to receive the uri and prefix information from the XMLReader, the
XMLReader must have
- * support for XML Namespaces
+ * Handle and unparse XMLReader SAX events using a provided DataProcessor and
+ * OutputChannel
*
- * This class, together with the SAXInfosetInputter, uses coroutines to ensure
that a batch of events
- * (based on the tunable saxUnparseEventBatchSize) can be passed from the
former to the latter.
- * The following is the general process:
+ * Note: XMLReaders using this as their ContentHandler must have support for
XML
+ * namespaces so that we provided namespace URI and prefix information that
Daffodil
+ * requires to unparse.
*
- * - an external call is made to parse an XML Document
- * - this class receives a StartDocument call, which is the first
SAXInfosetEvent that should be
- * sent to the SAXInfosetInputter. That event is put onto an array of
SAXInfosetEvents of size the
- * saxUnparseEventBatchSize tunable. Once the array is full, it is put on the
inputter's queue,
- * this thread is paused, and that inputter's thread is run
- * - when the SAXInfosetInputter is done processing that batch and is ready
for a new batch, it
- * sends a 1 element array with the last completed event via the coroutine
system, which loads it on
- * the contentHandler's queue, which restarts this thread and pauses that one.
In the expected case,
- * the single element array will contain no new information until the unparse
complete. In the case of
- * an unexpected error though, it will contain error information
- * - this process continues until the EndDocument SAXInfosetEvent is loaded
into the batch.
- * Once that SAXInfosetEvent is processed by the SAXInfosetInputter, it
signals the end of batched
- * events coming from the contentHandler. This ends the unparseProcess and
returns the event with
- * the unparseResult and/or any error
- * information
+ * The SAX ContentHandler API is push-based, but the Daffodil InfosetInputter
unparse
+ * API is pull-based, so these two API's are at odds with one another. To link
the
+ * two, we create two classes that implement a coroutine-like API to
communicate and
+ * ensure that the push and pull sides of the two APIs never run at the same
time
+ * (see Coroutine.scala for implementation details). The main coroutine or
"event
+ * queuer" is this DaffodilUnparseContentHandler and runs on the same thread
as an
+ * XMLReader to handle and batch SAX events. The worker coroutine or "event
puller" is
+ * an instance of the SAXInfosetInputter which calls the actual unparse()
function to
+ * query these batched SAX events and unparse data.
*
- * @param dp dataprocessor object that will be used to call the parse
- * @param output outputChannel of choice where the unparsed data is stored
+ * Below is a description of how this class is used and they communicate to
unparse
+ * using the SAX XMLReader API:
+ *
+ * 1. A DaffodilUnparseContentHandler instance is created, which initializes a
+ * SAXInfosetInputter instance.
+ *
+ * 2. An XMLReader instance is created and configured to use the
+ * DaffodilUnparseContentHandler to handle its SAX events.
+ *
+ * 3. The DaffodilUnparseContentHandler handles events from the XMLReader,
gathers
+ * the necessary information from those events, and fills out an array of
+ * SAXInfosetEvent objects, called a "batch".
+ *
+ * 4. When a full batch has been gathered, or an endDocument() event is
handled, the
+ * DaffodilUnparseContentHandler calls resume() to pause execution and
start the
+ * SAXInfosetInputter coroutine, sending it the batch of events.
+ *
+ * 5. Since this is the first time the SAXInfosetInputter has been given
control, its
+ * coroutine Thread is started and its run() function called. This calls
+ * waitResume() to receive the first batch of events. It then calls the
+ * DataProcessor unparse() function to begin the unparse.
+ *
+ * 6. The SAXInfosetInputter functions are called by the unparse, which reads
the
+ * batched events and provides the necessary information to unparse the
infoset.
+ * Once the SAXInfosetInputter has unparsed all batched events and needs a
new
+ * event in the next() function, it calls resume() to pause execution and
resume
+ * the DaffodilUnparseContentHandler coroutine, sending it Nope to signify
it
+ * needs more events.
+ *
+ * 7. The DaffodilUnparseContentHandler resumes and again continues to handle
SAX
+ * events and fill out the batched events array until it is full or
endDocument()
+ * is handled, at which point it calls resume() to pause and send the new
batch of
+ * events to the SAXInfosetInputter where it resumes control.
+ *
+ * 8. Steps 6 and 7 repeat until the SAXInfosetInputter signals back to the
+ * DaffodilUnparseContentHandler that it is complete calling resumeFinal()
and
+ * sending a One object containing either an UnparseResult or an Exception.
At
+ * this point, the SAXInfosetInputter is complete and provide control back
to the
+ * DaffodilUnparseContentHandler--the SAXInfosetInputter coroutine is done.
+ *
+ * 9. The DaffodilUnparseContentHandler resumes, recieves and examines the
result
+ * from the SAXInfosetInputter, and either makes the UnparseResult
available to
+ * the XMLReader or throws a SAXException if there was an error.
+ *
+ * @param dp DataProcessor object that will be used to start the unparse
+ * @param output OutputChannel of where the unparsed data is written
*/
class DaffodilUnparseContentHandler(
dp: DFDL.DataProcessor,
output: DFDL.Output)
- extends DFDL.DaffodilUnparseContentHandler {
- private lazy val inputter = new SAXInfosetInputter(this, dp, output)
- private var unparseResult: DFDL.UnparseResult = _
+ extends MainCoroutine[Maybe[Either[Exception, DFDL.UnparseResult]]]
+ with DFDL.DaffodilUnparseContentHandler {
Review Comment:
Just making another observation. This line changes
DaffodilUnparseContentHandler from MainCoroutine[Array[SAXInfosetEvent]] to
MainCoroutine[Maybe[Either[Exception, DFDL.UnparseResult]]].
SAXInfosetInputter is still Coroutine[Array[SAXInfosetEvent]]. What this means
is that a pair of coroutines can send data to each other without the data
having to be the same type on both ends:
```scala
// dataToReceive and dataToSend can be different types!
val dataToReceive = this.resume(otherCoroutine, dataToSend)
```
This is valuable to know - maybe worth adding to a comment somewhere.
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/SAXInfosetInputter.scala:
##########
@@ -21,79 +21,70 @@ import java.net.URI
import java.net.URISyntaxException
import org.apache.daffodil.api.DFDL
-import org.apache.daffodil.api.DFDL.DaffodilUnhandledSAXException
-import org.apache.daffodil.api.DFDL.DaffodilUnparseErrorSAXException
-import org.apache.daffodil.api.DFDL.SAXInfosetEvent
import org.apache.daffodil.dpath.NodeInfo
import org.apache.daffodil.exceptions.Assert
import org.apache.daffodil.infoset.InfosetInputterEventType.EndDocument
import org.apache.daffodil.infoset.InfosetInputterEventType.StartElement
+import org.apache.daffodil.processors.DaffodilUnparseContentHandler
+import org.apache.daffodil.util.Coroutine
+import org.apache.daffodil.util.Maybe
import org.apache.daffodil.util.Maybe.One
+import org.apache.daffodil.util.Maybe.Nope
import org.apache.daffodil.util.MaybeBoolean
import org.apache.daffodil.util.Misc
import org.apache.daffodil.xml.XMLUtils
/**
- * The SAXInfosetInputter consumes SAXInfosetEvent objects from the
DaffodilUnparseContentHandler
- * class and converts them to events that the DataProcessor unparse can use.
This class contains an
- * array of batched SAXInfosetEvent objects that it receives from the
contentHandler and the index
- * of the current element being processed.
+ * The SAXInfosetInputter worker coroutine receives batches of SAXInfosetEvent
+ * objects from a DaffodilUnparseContentHandler main coroutine and calls the
+ * unparse() function to provide these events and unparse the data.
+ *
+ * See the DaffodilUnparseContentHandler for a detailed description of how
these two
+ * classes interact.
Review Comment:
This is the place that might send a programmer to the wrong place
(api.DFDL.DaffodilUnparseContentHandler instead of
processors.DaffodilUnparseContentHandler).
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/api/DFDLParserUnparser.scala:
##########
@@ -226,84 +221,22 @@ object DFDL {
}
trait DaffodilUnparseContentHandler
- extends ProducerCoroutine
- with org.xml.sax.ContentHandler {
+ extends org.xml.sax.ContentHandler {
def getUnparseResult: UnparseResult
- def enableInputterResolutionOfRelativeInfosetBlobURIs(): Unit
+ def enableResolutionOfRelativeInfosetBlobURIs(): Unit
}
Review Comment:
I'm not very happy that Daffodil has both a
DFDL.DaffodilUnparseContentHandler trait and a
processors.DaffodilUnparseContentHandler class. I'm especially not happy that
this DFDL.DaffodilUnparseContentHandler trait has no scaladocs to tell
programmers that the real work is done in the
processors.DaffodilUnparseContentHandler class. SAXInfosetInputter's scaladocs
tells programmers to see DaffodilUnparseContentHandler for a detailed
description of how the DaffodilUnparseContentHandler and SAXInfosetInputter
classes interact. I fear a programmer might come here by mistake and get
puzzled why this DaffodilUnparseContentHandler doesn't have any scaladocs or
real code. If we really need both the trait and the class (do we really need
both?), the trait needs its own scaladocs.
I noticed that one half of grep -w ContentHandler's hits have an import
org.xml.sax.ContentHandler while the other half don't bother to import
ContentHandler. I'm just mentioning that fact; I don't have a strong opinion
whether ContentHandler should be imported or qualified although it was your
change at line 233 using SAXException instead of org.xml.sax.SAXException which
drew my eye to line 224 still using org.xml.sax.ContentHandler.
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/SAXInfosetInputter.scala:
##########
@@ -21,79 +21,70 @@ import java.net.URI
import java.net.URISyntaxException
import org.apache.daffodil.api.DFDL
-import org.apache.daffodil.api.DFDL.DaffodilUnhandledSAXException
-import org.apache.daffodil.api.DFDL.DaffodilUnparseErrorSAXException
-import org.apache.daffodil.api.DFDL.SAXInfosetEvent
import org.apache.daffodil.dpath.NodeInfo
import org.apache.daffodil.exceptions.Assert
import org.apache.daffodil.infoset.InfosetInputterEventType.EndDocument
import org.apache.daffodil.infoset.InfosetInputterEventType.StartElement
+import org.apache.daffodil.processors.DaffodilUnparseContentHandler
+import org.apache.daffodil.util.Coroutine
+import org.apache.daffodil.util.Maybe
import org.apache.daffodil.util.Maybe.One
+import org.apache.daffodil.util.Maybe.Nope
import org.apache.daffodil.util.MaybeBoolean
import org.apache.daffodil.util.Misc
import org.apache.daffodil.xml.XMLUtils
/**
- * The SAXInfosetInputter consumes SAXInfosetEvent objects from the
DaffodilUnparseContentHandler
- * class and converts them to events that the DataProcessor unparse can use.
This class contains an
- * array of batched SAXInfosetEvent objects that it receives from the
contentHandler and the index
- * of the current element being processed.
+ * The SAXInfosetInputter worker coroutine receives batches of SAXInfosetEvent
+ * objects from a DaffodilUnparseContentHandler main coroutine and calls the
+ * unparse() function to provide these events and unparse the data.
+ *
+ * See the DaffodilUnparseContentHandler for a detailed description of how
these two
+ * classes interact.
*
- * This class, together with the SAXInfosetInputter, uses coroutines to ensure
that a batch of events
- * (based on the tunable saxUnparseEventBatchSize) can be passed from the
former to the latter.
- * The following is the general process:
- *
- * - the run method is called, with the first batch of events, starting with
the StartDocument event,
- * already loaded on the inputter's queue.
- * This is collected and stored in the batchedInfosetEvents member, and the
currentIndex is set to 0
- * - The dp.unparse method is called, and it calls hasNext to make sure an
event exists to be
- * processed and then queries the event at currentIndex. The hasNext call also
checks that there is
- * a next event to be processed (currentIndex+1), and if not, queues the next
batch of events by
- * transferring control to the contentHandler so it can load them.
- * - After it is done with the current event, it calls inputter.next to get
the next event, and that
- * increments the currentIndex and cleans out the event at the previous index
- * - This process continues until the event at currentIndex either contains an
EndDocument event or
- * the currentIndex is the last in the batch. If it is the former, the
endDocumentReceived flag is
- * set to true and hasNext will return false. If it is the latter, the next
batch of events will be
- * queued by transferring control to the contentHandler so it can load them.
- * - This ends the unparse process, and the unparseResult and/or any Errors
are set on a single element
- * array containing response events. We call resumeFinal passing along that
array, terminating this
- * thread and resuming the contentHandler for the last time.
- *
- * @param unparseContentHandler producer coroutine that sends the
SAXInfosetEvent to this class
- * @param dp dataprocessor that we use to kickstart the unparse process and
that consumes the
- * currentEvent
- * @param output outputChannel of choice where the unparsed data is stored
+ * @param unparseContentHandler main coroutine that sends the SAXInfosetEvents
to this class
+ * @param dp DataProcessor that we use to start the unparse and provide
infoset information
+ * @param output OutputChannel where the unparsed data is written
+ * @param resolveRelaiveInfosetBlobURIs if true, elements with type xs:anyURI
type and a
+ * relativeURI are resolved relative to the classpath. This should only be
true when used
+ * via a TDMLRunner or similar testing infrastructure
*/
class SAXInfosetInputter(
- unparseContentHandler: DFDL.DaffodilUnparseContentHandler,
+ unparseContentHandler: DaffodilUnparseContentHandler,
Review Comment:
After this line changing DFDL.DaffodilUnparseContentHandler to
DaffodilUnparseContentHandler, I believe there is only one place left in the
codebase which uses DFDL.DaffodilUnparseContentHandler in an API-visible place
(DataProcessor's newContentHandlerInstance method):
```rg
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala
257: override def newContentHandlerInstance(output: DFDL.Output):
DFDL.DaffodilUnparseContentHandler =
```
When Mike and I wrote the C code generator (runtime2), we decided we didn't
want or need to implement all of DFDL.DataProcessor so we split it into
DFDL.DataProcessorBase and DFDL.DataProcessor. Runtime2DataProcessor didn't
need newContentHandlerInstance at all. Now, speaking from my experience
working on a second code generator and integrating it into Daffodil, I would
say we didn't even need Runtime2DataProcessor either. We needed only
Runtime2TDMLDFDLProcessor to run TDML tests with DaffodilC and we could have
put all of Runtime2DataProcessor's code in Runtime2TDMLDFDLProcessor instead of
Runtime2DataProcessor. I think the only parts of Daffodil we need to add or
override for another code generator are another language in the "daffodil
generate \<language\>" CLI, another new generator backend, and another
Daffodill\<Language\>TDMLDFDLProcessor.
No real point here, just making some observations such as that there doesn't
seem to be any motivation for a programmer to take advantage of Daffodil's
extensible DFDL.* API to replace Compiler, DataProcessor,
DaffodilUnparseContentHandler, etc. (it's hard to know that except in hindsight
and new use cases still might appear, though).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]