On 30/07/2024 23:17, Benjamin Young wrote:
Hi all,

One constant use case/need I have for a tool that could (probably very easily) be built 
on Apache Annotator is a "bridge" between Text Fragments and Web Annotations.
https://developer.mozilla.org/en-US/docs/Web/Text_fragments

I'm currently writing tests for W3C Verifiable Credentials as part of the W3C 
specification publication process. I'd done this before as part of the Web 
Annotation Working Group years ago. Both times, I wanted (and did/do as much as 
possible each time) to associate each test with the quotation from the 
specification.

Currently, I'm using Text Fragments:
https://github.com/w3c/vc-data-model-2.0-test-suite/blob/main/tests/10-vcdm2.js#L58-L60

What I lack is a way to check whether those quotations are still accurate on 
the current specification (as during Working Group life, those can change 
often).

The idea would be to take a list of URLs containing Text Fragments and turn them into an 
Annotation Collection of text quote selection annotations, then run those through Apache 
Annotator on the page (possibly with the use of a headless DOM environment), and return 
the list of quotes that were not found (aka "orphaned annotations").

My priorities continue to keep me elsewhere, but I thought I'd share in case 
someone else had the same need or a burning interest in making something simple 
with Apache Annotator. :)

Note previous discussion here: https://github.com/apache/incubator-annotator/issues/60

I’ll quote my last comment back then (november 2020):

   We shortly discussed this topic in today’s call while looking over
   the open issues. We agreed that just parsing the syntax is not much
   use, as it comes together with a specific algorithm for finding the
   target text, which differs from the Web Annotation model.

   A quick overview of things we could provide:

     *

       anchoring of a fragment directive: I implemented the essence of
       this already (see above comment); we could provide a function
       that simply wraps my implementation. (we even discussed the
       option of importing my whole implementation into this repo,
       though to me it feels cleaner to keep these as separate projects)

     *

       describing a selection (a Range or perhaps a list of Ranges) as
       a fragment directive: this would need a custom adaptation of
       |describeTextQuote|, modified to ensure that the total quote
       (including prefix&suffix) ends at word boundaries (note that at
       least this is possible now, since a recent change
       <https://github.com/WICG/scroll-to-text-fragment/pull/148> in
       the spec). Also, it should use a |textStart,textEnd| pair (again
       to be cut at word boundaries) instead of an exact quote when the
       selection crosses block elements. And perhaps there are more
       hurdles.

     *

       convert fragment directive ⇒ Selector: If the document is
       available, we could simply anchor it and describe it in the
       other format. Without the document at hand, we could also
       convert it, although with a (hopefully small) risk that the
       differences in specifications will make it fail to anchor or
       (worse) point at something else. I think the conversion could,
       after syntax parsing, be done with more or less this simple code:

       |({ prefix, textStart, textEnd, suffix }) => textEnd ? { type:
       'RangeSelector', start: { type: 'TextQuoteSelector', prefix,
       exact: textStart }, end: { type: 'TextQuoteSelector', prefix:
       textEnd, exact: '', suffix } } : { type: 'TextQuoteSelector',
       prefix, exact: textStart, suffix } |

       (note the little hack of using |prefix: textEnd, exact: ''|
       because RangeSelector’s end is exclusive and |textEnd| should
       nevertheless be included in the target)

     *

       convert Selector ⇒ fragment directive: the reverse of the above.
       Again, if the document is available, we could simply anchor it
       and describe it in the other format. But in case the document is
       not available, conversion in this direction would only possible
       if the selector is of the type/shape shown in the above example
       code.

   I suppose it is mainly a matter of demand and priority whether we’ll
   implement any of these. I might actually try tackle some of these
   points soon, as I would like to use these features myself.

I think I did not tackle any of this, different priorities.. and I surely have no time for this now anymore.

Reply via email to