Hi Giuseppe,

Your summary is 100% correct, and surely helpful to other users.

An even better option than using data() is the ENFORCEINDEX option,
which was added with BaseX 9. It allows you to enforce index
rewritings for the whole query, or for specific comparisons or
predicates.

I noticed it was restricted to comparisons with static values, which
is something I have just changed. The following query should be faster
than the original version:

  for $s in db:open("ru_syntagrus-ud-dev")//s//t
  for $d in db:open("UD_Russian-SynTagRus")//case
  where (# db:enforceindex #) { $d/verb_lemma = $s/@l }
    and $d//verb_form/@value = $s/@f
    and $d/aspect-values/@sign = "yes"
  return $s

I have added some words on this enhancement in the documentation [1].
Feel free to check out the new snapshot [2],
Christian

[1] http://docs.basex.org/wiki/Indexes#Enforce_Rewritings
[2] http://files.basex.org/releases/latest/



On Thu, May 24, 2018 at 11:18 AM, Giuseppe Celano
<cel...@informatik.uni-leipzig.de> wrote:
> Hi  Christian,
>
> Thank you for your help! To summarize (also for the benefit of other users),
> while it is true that in XQuery data($d/aspect-values/@sign) = "yes" and
> $d/aspect-values/@sign = "yes" are equivalent (because of atomization), the
> use of data() enables the user to
> prevent the use of a certain index in BaseX (so this is a BaseX-specific
> feature). Paying attention to how BaseX uses indexes (which can be seen in
> the GUI Info panel) seems to be particularly important when join operations
> between documents are done: as far as I understand, which index and how
> these indexes are used automatically by BaseX cannot be predicted in
> advance, so what one can do is to actually try to use the data() function in
> order to test which index use turns out to be the best (especially when the
> query evaluates slowly).
>
> Is this correct?
>
> Thank you again!
> Giuseppe
>
>
> Universität Leipzig
> Institute of Computer Science, NLP
> Augustusplatz 10
> 04109 Leipzig
> Deutschland
> E-mail: cel...@informatik.uni-leipzig.de
> E-mail: giuseppegacel...@gmail.com
> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
> Web site 2: https://sites.google.com/site/giuseppegacelano/
>
> On May 23, 2018, at 2:44 PM, Christian Grün <christian.gr...@gmail.com>
> wrote:
>
> Hi Giuseppe,
>
> I think your observation was related to another issue that has already
> been fixed recently. Did you try the latest snapshot [1]?
>
> Btw, in your specific query I noticed that the data() may indeed be
> helpful to suppress the index rewriting for the last condition. As
> it’s the only one that has a static comparison string, it will be the
> one that will be chosen for index access, but for your data, it will
> actually be better if one of the other two conditions will be
> evaluated by the index.
>
> Thanks for the sample documents,
> Christian
>
> PS: 9.0.2 will be available until end of May.
>
> [1] http://files.basex.org/releases/latest/
>
>
>
> On Tue, May 22, 2018 at 5:22 PM, Giuseppe Celano
> <cel...@informatik.uni-leipzig.de> wrote:
>
> I think I have identified a problem with atomization of attribute content
> (no database involved). I have a simple query:
>
> for $s in doc("doc1")//s//t
> for $d in doc("doc2")//case
> where  $d/verb_lemma = $s/@l and $d//verb_form/@value = $s/@f and
> $d/aspect-values/@sign = "yes"
> return
> $s
>
> In order to get a result, I (necessarily) need to use the data() function in
> data($d/aspect-values/@sign) = "yes", otherwise the query never returns a
> result. Is this a bug?
> I would expect that the value of @sign is automatically atomized and
> compared to "yes", but this does not seem the case.
> Thanks.
>
> Ciao,
> Giuseppe
>
> Universität Leipzig
> Institute of Computer Science, NLP
> Augustusplatz 10
> 04109 Leipzig
> Deutschland
> E-mail: cel...@informatik.uni-leipzig.de
> E-mail: giuseppegacel...@gmail.com
> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
> Web site 2: https://sites.google.com/site/giuseppegacelano/
>
>
>

Reply via email to