Treora opened a new pull request #99:
URL: https://github.com/apache/incubator-annotator/pull/99


   A TextQuoteSelector can add as much prefix and suffix as desired. Until now, 
we only added prefix and suffix as much as was strictly necessary to 
disambiguate the target from other occurrences of the exact same text in the 
same document. When an annotation should still anchor on a modified version of 
the document, it can be helpful to add a little more context, in order to be 
robust against the ambiguity that would result if after such a modification the 
quoted text appears in more places than before.
   
   Also, it seems neat to have the prefix and suffix contain whole words 
instead of stopping halfway inside a word. This makes it pleasant to read when 
user interfaces expose the prefix&suffix. Also it makes the implementation 
closer to being compatible with the WICG TextFragments spec (see #60).
   
   This PR thus adds two ways to generate less minimal prefixes&suffixes:
   
    1. Round them up to the next whitespace.
    2. Optionally add prefix&suffix around a short quote even if it is not
        ambiguous.
   
   I made rounding up to whitespace the default behaviour, while the previous 
behaviour can still be obtained using the option `minimalContext`. For the 
context around short quotes I would not know what would be a good default 
(might depend on use case and document length?); so I left it at 0 for now, 
i.e. the feature is turned off by default.
   
   This PR also refactors the implementation a bit, reusing the seekers instead 
of creating new ones on every match.
   
   To pass options, I added an `options` object as the last function parameter. 
I thought we might want to move the `scope` parameter into this option object 
too, but `scope` is specific to the DOM implementation, so I’m not sure if that 
is desirable.
   
   I added options for anything that would otherwise feel like we’re hardcoding 
a ‘magic number’, but of course quite some choices on how exactly the algorithm 
works are hardcoded opinions too. I doubted between a few variations, but 
thought this the most straightforward with I hope generally sensible results. 
To be seen in practice, I guess.
   
   I added basic tests for each of the new behaviours. Currently these tests 
are still in the dom package, but should be refactored and moved into the 
selector package as the actual algorithms being tested reside there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to