[GitHub] [daffodil] stevedlawrence commented on pull request #879: Improve SAX parse/unparse performance

GitBox Tue, 29 Nov 2022 11:58:17 -0800


stevedlawrence commented on PR #879:
URL: https://github.com/apache/daffodil/pull/879#issuecomment-1331224033


   > Can this be set small enough as to enforce sequential behavior, i.e., no 
parallelism between the caller of the content handler and the unparser?
   
   With this approach, no. Even with the batch size tunable is set to 1, the 
ContentHandler and SAXInfosetInputter can (and likely will) both do work a the 
same time. The ContentHandler will be preparing the next event while the 
SAXInfosetInputter is unparsing using the current event.
   
   > To reduce overhead, we need to enqueue many events before context 
switching and allowing the unparser to run. Arguably, we should just queue up 
events to some max count, or until we get endDocument. For small messages we 
would then get exactly one context switch per message.
   
   Some of the changes here weren't specific to the ArrayBlockingQueue approach 
(e.g. thread pool reuse, split() removal). I can apply them to the current 
coroutine approach and see how it compares. These changes definitely had a big 
speed up, but I'm not sure which individual changes had the biggest effect.
   
   > I continue to be of the opinion that overlap parallelism here is not an 
advantage. It just muddies the waters about timing and overhead of unparsing.
   
   I think one potential advantage of this parallel approach is if the incoming 
sax events are sporadic or relatively slow (e.g. serialized over a 
network/diode). With the coroutine approach, we won't attempt to do any 
unparsing until we get a full batch of events (or we reach the endDoc event). 
And if the batch size is set to something large to avoid context switching, it 
might mean a lot of waiting doing nothing until we get those events. With this 
parallel approach, we can at least start unparsing immediately and do work in 
the time waiting for those SAX events. Though, I'm not sure how likely that is 
with SAX so maybe it's not worth really considering. And maybe if that is the 
case, tuning the batch size to a small number is the right approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [daffodil] stevedlawrence commented on pull request #879: Improve SAX parse/unparse performance

Reply via email to