vrish88 opened a new pull request, #130:
URL: https://github.com/apache/incubator-annotator/pull/130

   Hello and thank you for this wonderful project. It's provided some excellent 
shoulders to stand on.
   
   ### Context
   I'm extracting footnotes embedded in markdown and converting them 
annotations. Some of these markdown files have over 500k characters in them and 
have over 100 footnotes. After a quite circuitous route, I'm using 
mdast/hast/remark to convert the markdown into html and then loading the html 
into a jsdom Document.
   
   ### The Problem
   I found that extracting footnotes for some of the larger files was taking 7 
- 10 minutes to process. Running a profiler, it looked like 70% of the time was 
spent determining if the node intersected the document/scope.
   
![image](https://user-images.githubusercontent.com/36475/187796176-82638def-398a-405a-b78c-6a0177f2f04b.png)
   
   That [call is 
happening](https://github.com/apache/incubator-annotator/blob/main/packages/dom/src/text-node-chunker.ts#L64)
 when the node is being converted to a chunk, which happens many times, per 
annotation. It is also only being used to ensure that the node is apart of the 
document (as far as I can tell).
   
   ### The Solution
   This PR removes that check. It improved the performance on my machine by 75% 
for the large files.
   
   Behaviorally I _think_ it is the same. The two things which invoke 
`nodeToChunk` appear to be already checking if those nodes are a part of the 
scope.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@annotator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to