Re: [basex-talk] repeatedly full-text marking the same text node

Loren Cahlander Sun, 10 May 2020 08:41:09 -0700

Take a look at exist-Stanford-nlp in my GitHub. Take a look at the code for the 
named entity recognition

https://github.com/lcahlander/exist-stanford-nlp/blob/master/src/main/xquery/ner-module.xqm

Loren Cahlander

Sent from my iPhone

On May 10, 2020, at 10:13 AM, Graydon <graydon...@gmail.com> wrote:

On Sun, May 10, 2020 at 03:35:45AM -0400, Liam R. E. Quin scripsit:
>> On Fri, 2020-05-08 at 14:52 -0400, Graydon Saunders wrote:
>> The idea would be to iterate through the list, marking up the node
>> with any matches.
> 
> Can you instead use standoff markup? E.g. store positions of start and
> end as word counts, and then merge them later?

In principle, yes.  But then I would have to be smart and extract the
positions correctly somehow and then get all the positional arithmetic
correct.

The attraction of the full-text index was a combination of speed and
being able to let some other smarter person handle the "does the match
still work if there's a line break? bunches of tabs?" issues.

I now think this just isn't a full-text use case; I was trying to think
of a way to use something optimized for single-pass search to support
recursion on the changed content and that loses all the attractive
optimizations.  Nothing says I can't use analyze-string and recursion.

Thanks!

-- Graydon

Re: [basex-talk] repeatedly full-text marking the same text node

Reply via email to