vrish88 opened a new pull request, #130: URL: https://github.com/apache/incubator-annotator/pull/130
Hello and thank you for this wonderful project. It's provided some excellent shoulders to stand on. ### Context I'm extracting footnotes embedded in markdown and converting them annotations. Some of these markdown files have over 500k characters in them and have over 100 footnotes. After a quite circuitous route, I'm using mdast/hast/remark to convert the markdown into html and then loading the html into a jsdom Document. ### The Problem I found that extracting footnotes for some of the larger files was taking 7 - 10 minutes to process. Running a profiler, it looked like 70% of the time was spent determining if the node intersected the document/scope.  That [call is happening](https://github.com/apache/incubator-annotator/blob/main/packages/dom/src/text-node-chunker.ts#L64) when the node is being converted to a chunk, which happens many times, per annotation. It is also only being used to ensure that the node is apart of the document (as far as I can tell). ### The Solution This PR removes that check. It improved the performance on my machine by 75% for the large files. Behaviorally I _think_ it is the same. The two things which invoke `nodeToChunk` appear to be already checking if those nodes are a part of the scope. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@annotator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org