Dear Annotator Developers, I am having a strange off by one error. I am using Python to convert an HTML document plus annotations into a tagged text document. To make sure my code is working properly I want to make sure I can recreate the quote field stored in an annotation using the original HTML document and the range field. Here is some pseudo-code describing how I am recreating the quote:
def get_range(doc, start, end, startOffset, endOffset): started = False ended = False nchars_past_end = 0 result = '' for path, text in doc.iterator(): if path.endswith(start): started = True if path.endswith(end): ended = True if started: result += text if ended: nchars_past_end += len(text) if nchars_past_end >= endOffset: break cutoff = nchars_past_end - endOffset return result[startOffset:(len(result) - cutoff)] This code works with these annotations and this HTML document: https://github.com/amarder/hal/blob/master/tagger/data_highlights.json https://github.com/amarder/hal/blob/master/tagger/data_filing.html But when I test the code with the following annotations and HTML document my quotes are shifted to the left (there is one extra character at the beginning of the string and one missing character at the end of the string): https://github.com/amarder/hal/blob/master/tagger/data_text_highlights.json https://github.com/amarder/hal/blob/master/tagger/data_text_filing.html I've been looking at the source code here: https://github.com/openannotation/xpath-range/blob/master/src/range.coffee But, I haven't been able to figure out why I'm having this issue. Any thoughts on what might be happening here would be greatly appreciated! Andrew PS I used Annotator v1.1.0 to create these annotations. _______________________________________________ annotator-dev mailing list annotator-dev@lists.okfn.org https://lists.okfn.org/mailman/listinfo/annotator-dev Unsubscribe: https://lists.okfn.org/mailman/options/annotator-dev