Treora opened a new pull request #79:
URL: https://github.com/apache/incubator-annotator/pull/79


   See #77. This branch tests description and anchoring of text quote 
selectors. It does not attempt to exhaust every possible input, nor cover every 
branch of the code, but uses several example (edge) cases that may likely cause 
a mistaken implementation to misbehave (‘*pre*gression testing’? :)). In fact I 
had to make quite some changes in our implementations to make them behave as I 
would expect.
   
   A few of these expectations are opinionated and desired behaviour may be 
discussed:
   
   - when trying to describe a range that extends beyond its scope, one could 
argue describeTextQuote should throw an error; or that it should ignore the 
parts that fall outside of the scope. I chose the latter for now.
   - where should an empty quote match? Empty quotes may actually be useful 
when combined with a prefix/suffix, in order to point at a specific point in 
the text (e.g. to point out where to insert a comma). But an empty quote 
without prefix/suffix matches at every string position, which may be unwieldy 
but seems the only coherent behaviour. (matching for a single character can 
result in an unwieldy amount of matches too, so this should not really be an 
issue)
   - should prefix and suffix be undefined or the empty string when not needed? 
I changed it to the latter now in order to follow the spec more closely: *“Each 
TextQuoteSelector SHOULD have exactly 1 prefix property”* (and same for suffix)
   
   Other changes along the way:
   
   - dropping the `dom-node-iterator` polyfill
   - updating `dom-seek` to v5 (I think we could simplify our code by moving 
some of the counting logic into that module instead, or build some other 
abstractions around ‘text node ranges’, but left this for another time)
   - `describeTextQuote` used to run the actual text quote matcher to find out 
if the selected quote is ambiguous in the given scope. Trying to simplify the 
code a little, I made it run this search for duplicates itself to avoid the 
needless conversions between ranges and string indexes; though we may want to 
return to the original approach some time to allow generalisation to other 
matchers.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to