On 30/07/2024 23:17, Benjamin Young wrote:
Hi all,
One constant use case/need I have for a tool that could (probably very easily) be built
on Apache Annotator is a "bridge" between Text Fragments and Web Annotations.
https://developer.mozilla.org/en-US/docs/Web/Text_fragments
I'm currently writing tests for W3C Verifiable Credentials as part of the W3C
specification publication process. I'd done this before as part of the Web
Annotation Working Group years ago. Both times, I wanted (and did/do as much as
possible each time) to associate each test with the quotation from the
specification.
Currently, I'm using Text Fragments:
https://github.com/w3c/vc-data-model-2.0-test-suite/blob/main/tests/10-vcdm2.js#L58-L60
What I lack is a way to check whether those quotations are still accurate on
the current specification (as during Working Group life, those can change
often).
The idea would be to take a list of URLs containing Text Fragments and turn them into an
Annotation Collection of text quote selection annotations, then run those through Apache
Annotator on the page (possibly with the use of a headless DOM environment), and return
the list of quotes that were not found (aka "orphaned annotations").
My priorities continue to keep me elsewhere, but I thought I'd share in case
someone else had the same need or a burning interest in making something simple
with Apache Annotator. :)
Note previous discussion here:
https://github.com/apache/incubator-annotator/issues/60
I’ll quote my last comment back then (november 2020):
We shortly discussed this topic in today’s call while looking over
the open issues. We agreed that just parsing the syntax is not much
use, as it comes together with a specific algorithm for finding the
target text, which differs from the Web Annotation model.
A quick overview of things we could provide:
*
anchoring of a fragment directive: I implemented the essence of
this already (see above comment); we could provide a function
that simply wraps my implementation. (we even discussed the
option of importing my whole implementation into this repo,
though to me it feels cleaner to keep these as separate projects)
*
describing a selection (a Range or perhaps a list of Ranges) as
a fragment directive: this would need a custom adaptation of
|describeTextQuote|, modified to ensure that the total quote
(including prefix&suffix) ends at word boundaries (note that at
least this is possible now, since a recent change
<https://github.com/WICG/scroll-to-text-fragment/pull/148> in
the spec). Also, it should use a |textStart,textEnd| pair (again
to be cut at word boundaries) instead of an exact quote when the
selection crosses block elements. And perhaps there are more
hurdles.
*
convert fragment directive ⇒ Selector: If the document is
available, we could simply anchor it and describe it in the
other format. Without the document at hand, we could also
convert it, although with a (hopefully small) risk that the
differences in specifications will make it fail to anchor or
(worse) point at something else. I think the conversion could,
after syntax parsing, be done with more or less this simple code:
|({ prefix, textStart, textEnd, suffix }) => textEnd ? { type:
'RangeSelector', start: { type: 'TextQuoteSelector', prefix,
exact: textStart }, end: { type: 'TextQuoteSelector', prefix:
textEnd, exact: '', suffix } } : { type: 'TextQuoteSelector',
prefix, exact: textStart, suffix } |
(note the little hack of using |prefix: textEnd, exact: ''|
because RangeSelector’s end is exclusive and |textEnd| should
nevertheless be included in the target)
*
convert Selector ⇒ fragment directive: the reverse of the above.
Again, if the document is available, we could simply anchor it
and describe it in the other format. But in case the document is
not available, conversion in this direction would only possible
if the selector is of the type/shape shown in the above example
code.
I suppose it is mainly a matter of demand and priority whether we’ll
implement any of these. I might actually try tackle some of these
points soon, as I would like to use these features myself.
I think I did not tackle any of this, different priorities.. and I
surely have no time for this now anymore.