Hi Adam, Thanks for writing to the list.
After having given you a quick reply in private, I have double-checked your new use cases, and once again I realized that it’s the complex specification rules that leads to behavior that’s difficult to grasp. I’ll try to make it short: This query returns true: 'A x B' contains text { 'A', 'B' } all distance at most 1 words The following query returns false because 'A B' is treated as a single search term: 'A x B' contains text 'A B' all distance at most 1 words The following query returns true. It’s actually equivalent to the first query: Due to “all words”, the single string will be tokenized into independent search terms. Things get freaky with the next use case: 'A x B x x B' contains text { 'A', 'B' } all distance at most 1 words The query creates three string matches: 1 for “A” and 2 for ”B”. The specification states: “When a distance selection applies a distance condition to more than two matches, the distance condition is required to hold on each successive pair of matches.” [1]. In our case, the rule does not hold on the last pair (the distance between “B” and “B” is too large). There is one way out, and it’s the usage of 'ftand': 'A x B x x B' contains text 'A' ftand 'B' all words distance at most 1 words This query will return two “full-text matches”, each containing two “string matches”, and the check will be successful if at least one full-text match is successul. In this case, it’s the first full-text match, which contains string-matches for “A” and “B”, which are at most one word distant from each other. Confused? I guess so ;) The current implementation of ft:search does not allow you to explicitly combine search strings with ftand/ftor/ftnot. I’ll have some more thoughts on what we could do here. Until then, feel free to try your luck with the static full-text syntax. Cheers, Christian [1] https://www.w3.org/TR/xpath-full-text-10/#ftdistance On Wed, Sep 16, 2020 at 12:13 PM Adam Law <adamjames...@gmail.com> wrote: > > (: > Hello - I saw some discussions about full text indexing. Dumbo here cannot > work out how distance matching works. > Also is it somehow possible to traverse up and down a full-index word list > from a hit position rather than having to spend time say reversing strings. > Is this is not possible due to how the word indexing works? If so can I > preprocess > https://github.com/pierrec/lz4/blob/master/fuzz/corpus/Mark.Twain-Tom.Sawyer_long.txt > to break it into words. > I was considering making something like DTSearch (but more flexible) before I > realised how difficult this is. > Many thanks > Dumbo > :) > > import module namespace functx = 'http://www.functx.com'; > (:fulltext <x><content><![CDATA[ > https://github.com/pierrec/lz4/blob/master/fuzz/corpus/Mark.Twain-Tom.Sawyer_long.txt > ]]></content></x>:) > > (:$f - return all nodes containing queer and enterprises:) > let $f := <x>{ft:search('xvue_textIndex',("enterprises", "queer"), map { > 'mode': 'all' })/parent::* }</x> > > let $options1 := map { 'mode': 'all'} > let $options2 := map { 'mode': 'all', "distance": map { "max": "5","unit": > "words" }} > let $options3 := map { 'mode': 'all words', "distance": map { "max": > "5","unit": "words" }} > let $options4 := map { 'mode': 'all words', "distance": map { "max": 5, > "unit": "words" }} > let $options := $options1 (:Why wont others work:) > > (:$g - mark words queers and enterprises. Can't get options:2,3,4 to work:) > let $g := <y>{ft:mark( $f//*[ft:contains(text(), ('queer','enterprises'), > $options)], 'mark')}</y> > > (:Hopeful - with distancing will this result in <mark>queer > enterprises</mark>. Otherwise I have to postprocess more:) > > (:Unsure about how to return words before and words after using fulltext. > Have to limit to characters:) > (:Ideally, I would like to be able to specify words after and before:) > let $charbefore := 30 > let $charafter := 30 > > (:This takes a while because I am string joining large > preceding-sibling:nodes() (sometimes text() and sometimes marked/text()) to > return words in context:)(:Three seconds:) > (:Is there a fulltext way of doing this that is faster eg traverse a word > list by match position:) > let $h := for $w in $g//mark > return <a><preceedingWords>{ > functx:reverse-string (substring( > > functx:reverse-string(string-join($w/preceding-sibling::node())),0,$charbefore))}</preceedingWords><match>{$w/text()}</match><followingWords>{substring(string-join($w/following-sibling::node() > ),0,$charafter)}</followingWords></a> > > return $h > > (: > $h := > <a> > <preceedingWords>thought and talked, > and what </preceedingWords> > <match>queer</match> > <followingWords> enterprises they sometimes e</followingWords> > </a> > <a> > <preceedingWords>t and talked, > and what queer </preceedingWords> > <match>enterprises</match> > <followingWords> they sometimes engaged in. > > </followingWords> > </a> > > Sorry about the length of this. > :)