[GitHub] [incubator-annotator] Treora opened a new issue #85: ‘Chunking’ abstraction

GitBox Mon, 17 Aug 2020 04:28:59 -0700


Treora opened a new issue #85:
URL: https://github.com/apache/incubator-annotator/issues/85



   In recent calls we (especially @tilgovi — so feel free to improve my 
description) have discussed an approach to allow text selector 
matching/describing implementations on other ‘document models’ than the DOM. A 
typical use case would be a (web) application that uses some framework 
(ProseMirror, React, …) to display documents, and therefore would not want the 
result of anchoring an annotation to be a Range object, but rather something 
that matches their internal representation of the document.
   
   A discussed requirement is also that the document can be provided piecemeal 
and asynchronously, so that an application can try anchor selectors on 
documents that are not fully available yet (or just not fully converted to text 
yet, think e.g. PDF.js). We have been calling such pieces of text ‘chunks’ for 
now.
   
   Currently, our text quote anchoring function (in the dom package) is 
hard-coded to search for text quote using Range, NodeIterator, TreeWalker. When 
using the chunk approach, this functionality should be composed of two parts: 
one generic text quote anchoring function that takes a stream of Chunks of 
text; and one dom-to-chunk converter that uses TreeWalkers and such to present 
the DOM as a stream of text Chunks.
   
   I am creating this issue to discuss what exactly a Chunk would be (a 
string?), and what a stream of chunks would be (an `AsyncIterable<Chunk>`?), 
and how our generalised anchoring functions interact with chunk providers (e.g. 
do we need an equivalent of Range, how do we pass back string offsets, …?). And 
also to discuss the assumptions and requirements (are we on the right track?).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [incubator-annotator] Treora opened a new issue #85: ‘Chunking’ abstraction

Reply via email to