[GitHub] [royale-asjs] estanglerbm opened a new pull request #1025: Speed up XML parsing of larger datasets by 30-50%

GitBox Wed, 30 Dec 2020 20:31:55 -0800


estanglerbm opened a new pull request #1025:
URL: https://github.com/apache/royale-asjs/pull/1025

Speed up XML parsing of larger datasets by 30-50% by using faster DOM APIs,
shorter string lookups, eliminating unnecessary string copies, and other
changes.

The original elapsed time was about 80% in post-text-parsing code (node
traversal, building XML objects, etc.) and 20% in text-parsing code
(parseFromString). Now, the time in text-parsing code dominates.

The hoops to use Node.getAttributeNames()--in addition to requiring a newer
browser--is partly due to this version of GCC not supporting it (was added
around 2017, I think).

One (not large) bottleneck is the node traversal, which may be visiting each
node twice (due to recursive + nextSibling; I just wrote a faster version of
the existing traversal). There may be some speedup by using TreeWalker to do
nextNode() traversal and using a map to hook up the parent relationships. But
not worth the effort, at the moment.

Another small bottleneck is the trim for whitespace nodes. A filter during
TreeWalker or during parsing would help here, to just eliminate the whitespace
nodes from the traversal.

The big bottleneck (by far) is now DOMParser.parseFromString(). I had mixed
results using createContextualFragment(), and that executes script, so not
ideal. I think it may be best to replace parseFromString() with something like
[fast-xml-parser](https://github.com/NaturalIntelligence/fast-xml-parser/tree/master/lib).
Double points if it has a callback where we can build XML objects without
actually materializing DOM objects. I haven't tried using an external library,
yet.

Passes existing XML tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [royale-asjs] estanglerbm opened a new pull request #1025: Speed up XML parsing of larger datasets by 30-50%

Reply via email to