stevedlawrence opened a new pull request, #1159: URL: https://github.com/apache/daffodil/pull/1159
The XMLTextInfosetOutputter and JSONInfosetOutputter do not use any buffering when writing data. Modifying these to wrap a BufferedWriter around the existing OutputStreamWriter gives significant performance improvements. Using a BufferedWriter also allows us to use its built-in newLine() function for pretty printing. This also modifies the "Standard" XML escape style in the xml infoset outputter so that it first checks if there are any characters that need to be escaped, similar to what we do for CDATA escape style. In most cases, there will not be any characters that need escaping, so we can avoid Scala XML's escape utility, which has noticeable overhead, even if nothing needs escaping. With these changes tested on a large file with lots of strings, this saw total parse + infoset output time drop from about 125 seconds to 93 seconds, about a 25% decrease. Note that parsing with the null infoset outputter takes about 78 seconds, so the xml infoset outputter overhead went from about 37% of the total parse time down to about 20%. DAFFODIL-2872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
