stevedlawrence commented on code in PR #908: URL: https://github.com/apache/daffodil/pull/908#discussion_r1067290881
########## daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilUnparseContentHandler.scala: ########## @@ -28,156 +28,223 @@ import org.xml.sax.Locator import org.apache.daffodil.api.DFDL import org.apache.daffodil.api.DFDL.DaffodilUnhandledSAXException import org.apache.daffodil.api.DFDL.DaffodilUnparseErrorSAXException -import org.apache.daffodil.api.DFDL.SAXInfosetEvent import org.apache.daffodil.exceptions.Assert import org.apache.daffodil.infoset.InfosetInputterEventType.EndDocument import org.apache.daffodil.infoset.InfosetInputterEventType.EndElement import org.apache.daffodil.infoset.InfosetInputterEventType.StartDocument import org.apache.daffodil.infoset.InfosetInputterEventType.StartElement +import org.apache.daffodil.infoset.SAXInfosetEvent import org.apache.daffodil.infoset.SAXInfosetInputter import org.apache.daffodil.util.MStackOf +import org.apache.daffodil.util.MainCoroutine import org.apache.daffodil.util.Maybe import org.apache.daffodil.util.Maybe.Nope import org.apache.daffodil.util.Maybe.One import org.apache.daffodil.util.Misc /** - * DaffodilUnparseContentHandler produces SAXInfosetEvent objects for the SAXInfosetInputter to - * consume and convert to events that the Dataprocessor unparse can use. The SAXInfosetEvent object - * is built from information that is passed to the ContentHandler from an XMLReader parser. In - * order to receive the uri and prefix information from the XMLReader, the XMLReader must have - * support for XML Namespaces + * Unparse SAX events received from an XMLReader using a provided DataProcessor and + * Output channel * - * This class, together with the SAXInfosetInputter, uses coroutines to ensure that a batch of events - * (based on the tunable saxUnparseEventBatchSize) can be passed from the former to the latter. - * The following is the general process: + * Note: XMLReaders using this as their ContentHandler must have support for XML + * namespaces so that we receive namespace URI and prefix information that Daffodil + * requires to unparse. * - * - an external call is made to parse an XML Document - * - this class receives a StartDocument call, which is the first SAXInfosetEvent that should be - * sent to the SAXInfosetInputter. That event is put onto an array of SAXInfosetEvents of size the - * saxUnparseEventBatchSize tunable. Once the array is full, it is put on the inputter's queue, - * this thread is paused, and that inputter's thread is run - * - when the SAXInfosetInputter is done processing that batch and is ready for a new batch, it - * sends a 1 element array with the last completed event via the coroutine system, which loads it on - * the contentHandler's queue, which restarts this thread and pauses that one. In the expected case, - * the single element array will contain no new information until the unparse complete. In the case of - * an unexpected error though, it will contain error information - * - this process continues until the EndDocument SAXInfosetEvent is loaded into the batch. - * Once that SAXInfosetEvent is processed by the SAXInfosetInputter, it signals the end of batched - * events coming from the contentHandler. This ends the unparseProcess and returns the event with - * the unparseResult and/or any error - * information + * The SAX ContentHandler API is push-based, but the Daffodil InfosetInputter unparse + * API is pull-based, so these two API's are at odds with one another. To link the + * two, we create two classes that implement a coroutine-like API to communicate and + * ensure that the push and pull sides of the two APIs never run at the same time + * (see Coroutine.scala for implementation details). The "producer" coroutine is this + * DaffodilUnparseContentHandler and runs on the same thread as an XMLReader to + * receive and batch SAX events. The "consumer" coroutine is an instance of the Review Comment: Do you think we should remove the terms producer and consumer entirely? As I was writing this I was a bit concerned that there could be confusion that the DaffodilUnparseContentHandler could be considered both a producer and consumer (e.g. it "consumes" things from the XMLReader (SAX events) and from the SAXInfosetInputter (UnparseResult), and produces for the SAXInfosetInputter (SAXInfosetEvents). I wonder if just using "main" and "peer" is enough of a distiction, and the documentation makes it clear when it gets things from one class and sends things to another? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
