stevedlawrence commented on PR #879: URL: https://github.com/apache/daffodil/pull/879#issuecomment-1331224033
> Can this be set small enough as to enforce sequential behavior, i.e., no parallelism between the caller of the content handler and the unparser? With this approach, no. Even with the batch size tunable is set to 1, the ContentHandler and SAXInfosetInputter can (and likely will) both do work a the same time. The ContentHandler will be preparing the next event while the SAXInfosetInputter is unparsing using the current event. > To reduce overhead, we need to enqueue many events before context switching and allowing the unparser to run. Arguably, we should just queue up events to some max count, or until we get endDocument. For small messages we would then get exactly one context switch per message. Some of the changes here weren't specific to the ArrayBlockingQueue approach (e.g. thread pool reuse, split() removal). I can apply them to the current coroutine approach and see how it compares. These changes definitely had a big speed up, but I'm not sure which individual changes had the biggest effect. > I continue to be of the opinion that overlap parallelism here is not an advantage. It just muddies the waters about timing and overhead of unparsing. I think one potential advantage of this parallel approach is if the incoming sax events are sporadic or relatively slow (e.g. serialized over a network/diode). With the coroutine approach, we won't attempt to do any unparsing until we get a full batch of events (or we reach the endDoc event). And if the batch size is set to something large to avoid context switching, it might mean a lot of waiting doing nothing until we get those events. With this parallel approach, we can at least start unparsing immediately and do work in the time waiting for those SAX events. Though, I'm not sure how likely that is with SAX so maybe it's not worth really considering. And maybe if that is the case, tuning the batch size to a small number is the right approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
