Hi Jack,

> When you say you can't reproduce it, do you mean you get 14 results from
running this script?

Yes, that’s what I meant.

The upcoming information will be very technical and specific. You are
welcome to focus on the examples.

Your updated example was helpful, and I noticed it’s a bunch of issues that
lead to the unexpected results. The core challenge is that ft:mark and
ft:extract only yield expected results if the internally collected
full-text metadata is not lost at some stage during the internal processing
– which can happen at many places hidden to the writer of the query.

In your specific example, the full-text information gets lost because
the local:search
function is too complex to be inlined by the compiler (which enables
further optimizations that eventually allow metadata propagation). You can
tackle this by forcing the compiler to inline your function:

  declare %basex:inline function local:search(...)

Using '(ethnicgroups, languages)' instead of 'name() = (...)' is another
practical advice; it helps the optimizer to detect at compile time that
metadata will be available at runtime. Another solution is to use
'local-name()' instead of 'name()' (local-name does not rely on namespace
that may possibly occur in a database, which also affects the way how
full-text queries are evaluated).

Here’s a variant that should work:

declare function local:search(
  $database  as xs:string,
  $query     as xs:string
) {
  let $country := ft:search($database, $query)/ancestor::country
  let $search := function($node) { $node/text() contains text { $query } }
  return (
    ft:mark($country[.//name[$search(.)]]),
    ft:mark($country[.//city[$search(.)]]),
    ft:mark($country[.//(ethnicgroups, languages)[$search(.)]])
  )
};
local:search('factbook', 'German')

…or…

  let $search := function($nodes) { $nodes[text() contains text { $query }]
}
  return (ft:mark($country[$search(.//name)]), ...

>From today’s perspective, we would certainly design ft:mark and ft:extract
in a way that the results are always correct. The consequences, however,
would be a much more restricted syntax.

Hope this helps,
Christian


On Thu, Feb 29, 2024 at 12:13 AM Jack Steyn <steynj...@gmail.com> wrote:

> Hi Christian,
>
> When I run your script, I do get 14 elements.
>
> When I run the following script I just get 12.
>
> <commands>
>   <set option='ftindex'>true</set>
>   <create-db name='factbook'>https://files.basex.org/xml/factbook.xml
> </create-db>
>   <xquery><![CDATA[
> declare function local:search(
>     $database as xs:string,
>     $query as xs:string
> ) {
>     let $country-search := ft:search($database, $query)/ancestor::country
>     let $city-search := ft:search($database,
> $query)/ancestor::city/ancestor::country
>     let $other-search := ft:search($database, $query)/parent::*[name() =
> ('ethnicgroups', 'languages')]/ancestor::country
>     let $country-mark := $country-search[.//name[text() contains text {
> $query }]] => ft:mark()
>     let $city-mark := $city-search[.//city[text() contains text { $query
> }]] => ft:mark()
>     let $other-mark := $other-search[.//*[name() = ('ethnicgroups',
> 'languages')][text() contains text { $query }]] => ft:mark()
>     return (
>         $country-mark,
>         $city-mark,
>         $other-mark
>     )
> };
>
> local:search('factbook', 'German')//mark
>   ]]></xquery>
> </commands>
>
> When you say you can't reproduce it, do you mean you get 14 results from
> running this script?
>
> Cheers,
>
> Jack
>
> On Thu, 29 Feb 2024, 1:02 am Christian Grün, <christian.gr...@gmail.com>
> wrote:
>
>> Hi Jack,
>>
>> Thanks for your observation.
>>
>>
>>> The first result of this query is the entry for Austria. I would expect
>>> both of the instances of the word 'German' in that entry to be surrounded
>>> by <mark> tags. However only the first instance is.
>>>
>>
>> I couldn’t reproduce this yet. Here’s a command script that returns 14
>> <mark>German</mark> elements:
>>
>> <commands>
>>   <set option='ftindex'>true</set>
>>   <create-db name='factbook'>https://files.basex.org/xml/factbook.xml
>> </create-db>
>>   <xquery><![CDATA[
>> let $groups := ('ethnicgroups', 'languages')
>> let $database := 'factbook'
>> let $query := 'German'
>>
>> let $search := ft:search($database, $query)/parent::*
>>   [name() = $groups]/ancestor::country
>> let $marked := ft:mark(
>>   $search[.//*[name() = $groups][text() contains text { $query }]]
>> )
>> return $marked//*[text() = 'German']
>>   ]]></xquery>
>> </commands>
>>
>> Could you check if you get the same result?
>>
>> Thanks in advance
>> Christian
>>
>>

Reply via email to