Treora opened a new pull request #99: URL: https://github.com/apache/incubator-annotator/pull/99
A TextQuoteSelector can add as much prefix and suffix as desired. Until now, we only added prefix and suffix as much as was strictly necessary to disambiguate the target from other occurrences of the exact same text in the same document. When an annotation should still anchor on a modified version of the document, it can be helpful to add a little more context, in order to be robust against the ambiguity that would result if after such a modification the quoted text appears in more places than before. Also, it seems neat to have the prefix and suffix contain whole words instead of stopping halfway inside a word. This makes it pleasant to read when user interfaces expose the prefix&suffix. Also it makes the implementation closer to being compatible with the WICG TextFragments spec (see #60). This PR thus adds two ways to generate less minimal prefixes&suffixes: 1. Round them up to the next whitespace. 2. Optionally add prefix&suffix around a short quote even if it is not ambiguous. I made rounding up to whitespace the default behaviour, while the previous behaviour can still be obtained using the option `minimalContext`. For the context around short quotes I would not know what would be a good default (might depend on use case and document length?); so I left it at 0 for now, i.e. the feature is turned off by default. This PR also refactors the implementation a bit, reusing the seekers instead of creating new ones on every match. To pass options, I added an `options` object as the last function parameter. I thought we might want to move the `scope` parameter into this option object too, but `scope` is specific to the DOM implementation, so I’m not sure if that is desirable. I added options for anything that would otherwise feel like we’re hardcoding a ‘magic number’, but of course quite some choices on how exactly the algorithm works are hardcoded opinions too. I doubted between a few variations, but thought this the most straightforward with I hope generally sensible results. To be seen in practice, I guess. I added basic tests for each of the new behaviours. Currently these tests are still in the dom package, but should be refactored and moved into the selector package as the actual algorithms being tested reside there. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org