Re: [basex-talk] Same query returns different amount of records

Sebastian Guerrero Mon, 18 May 2020 07:17:14 -0700

Hi Christian,

Thank you very much for your detailed answer, your comments are very useful
for me.


*- Could you check once again if this is fixed with the new snapshot?*: I
confirm it. With your new snapshot, the problem is fixed. [1].

Thank you very much for your comments about duplicate paths, you're right:
it's more performant if we write it in the other way. I've changed it. [2]

About "*ft:search*" ( and full index in general ) I've noticed a "strange"
behaviour when you perform a search using the full text.

But, I'll write about it in a separated thread to keep everything
consistent.

Best regards,
Sebastian

[1] https://imgur.com/XnRdxyD
[2] https://imgur.com/U1wo4y3





On Thu, May 14, 2020 at 11:25 AM Christian Grün <christian.gr...@gmail.com>
wrote:

> Hi Sebastian,
>
> I couldn’t get this reproduced out of the box. A technical guess:
> Global full-text options may have been overwritten by
> database-specific properties in the second switch branch at compile,
> which yielded wrong/restricted results in the first branch at runtime.
>
> Could you check once again if this is fixed with the new snapshot [1]?
>
> Some more comments on your query: If you formulate duplicate paths
> only once, you might get even better performance:
>
> OLD:
>   for $a in A
>   where $a/B/C/D/E contains text { $text }
>   return $a/B/C/D/E
>
> NEW:
>   for $e in A/B/C/D/E
>   where $e contains text { $text }
>   return $e
>
> In a future version of BaseX, such patterns will automatically be
> rewritten. Currently, basic patterns are already simplified [2]:
>
>   for $e in A/B where $e/C/D return $e/C/D/E
>   → A/B[C/D]/C/D/E
>   → A/B/C/D/E
>
> The enforceindex is still in a somewhat experimental stage (hence,
> thanks for your feedback), and its behavior is sometimes surprising if
> there are various competing candidates for index rewrites in your
> expression. If you want to have more control on how your queries are
> executed, you can directly call ft:search:
>
>     for $db in ('US00','US01','US02')
>     return ft:search($db, $text)[parent::mark-identification]
>
> If all 'mark-identification' elements occur on the same level in your
> document, you can omit the remaining parent steps (this will further
> speed up query evaluation). A look at the optimized query in the
> InfoView panel will give you some more hints.
>
> Cheers,
> Christian
>
> [1] http://files.basex.org/releases/latest/
> [2] https://github.com/BaseXdb/basex/issues/1864
>
>
>
> On Wed, May 13, 2020 at 11:23 PM Sebastian Guerrero <chap...@gmail.com>
> wrote:
> >
> > Hi everyone! it's me again.
> >
> > Here is my doubt:
> >
> > If I execute this query:
> >
> >              (# db:enforceindex #) {
> >                   for $db in ('US00','US01','US02')
> >                   for $tmUS in
> db:open($db)/trademark-applications-daily/application-information/file-segments/action-keys/case-file
> >                   where
> $tmUS/case-file-header/mark-identification/text() contains text { 'apple' }
> >                   return
> $tmUS/case-file-header/mark-identification/text()
> >                 }
> >
> > I get 4k results in 139ms from three databases of 90GB and 13M of
> records. It works like a charm. [01]
> >
> > But, if I include that query into a for and then into a switch ( I tried
> with if-then-else too ), the same query returns only 11 results in 107ms
> [02]:
> >
> > declare namespace gb="http://www.ipo.gov.uk/schemas/tm";;
> > let $text := "apple"
> > let $registries := ('US')
> >
> > for $registry in $registries
> > return
> >   switch ($registry)
> >
> >            case "US"
> >            return
> >            (# db:enforceindex #) {
> >                   for $db in ('US00','US01','US02')
> >                   for $tmUS in
> db:open($db)/trademark-applications-daily/application-information/file-segments/action-keys/case-file
> >                   where
> $tmUS/case-file-header/mark-identification/text() contains text { $text }
> >                   return
> $tmUS/case-file-header/mark-identification/text()
> >                 }
> >
> >             case "GB"
> >            return
> >                (# db:enforceindex #) {
> >                    for $tmGB in
> db:open('GB')/gb:MarkLicenceeExportList/gb:TradeMark
> >                  where
> $tmGB/gb:WordMarkSpecification/gb:MarkVerbalElementText/text() contains
> text { $text }
> >                  return
> $tmGB/gb:WordMarkSpecification/gb:MarkVerbalElementText/text()
> >                 }
> >
> >             default return "Unknown registry code"
> >
> >
> >
> > I noticed that removing the case option "GB" ( even if it's not
> evaluated ), it works fine and returns the 4k records [03]:
> >
> > declare namespace gb="http://www.ipo.gov.uk/schemas/tm";;
> > let $text := "apple"
> > let $registries := ('US')
> >
> > for $registry in $registries
> > return
> >   switch ($registry)
> >
> >            case "US"
> >            return
> >            (# db:enforceindex #) {
> >                   for $db in ('US00','US01','US02')
> >                   for $tmUS in
> db:open($db)/trademark-applications-daily/application-information/file-segments/action-keys/case-file
> >                   where
> $tmUS/case-file-header/mark-identification/text() contains text { $text }
> >                   return
> $tmUS/case-file-header/mark-identification/text()
> >                 }
> >
> >             default return "Unknown registry code"
> >
> >
> >
> > What I'm missing here? is this the right behaviour?
> >
> > Best regards,
> > Sebastian
> >
> > [01] https://imgur.com/o4RUUyO
> > [02] https://imgur.com/533c0rI
> > [03] https://imgur.com/mCb3qEe
>

Re: [basex-talk] Same query returns different amount of records

Reply via email to