On Fri, 2020-08-21 at 17:28 -0700, Bill Osmond wrote:
> I'm beginning to think that perhaps my performance hopes were a bit
> too
> inflated, given the size and complexity of our database. After a
> fresh
> optimization, and with -Xms2g -Xmx10g, the following query takes
> 1492ms:

[...]

First note - there are in fact no loops in your query. Although "for"
is used to introduce a loop in many procedural languages, it does nto
do so in XQuery (nor does for-each in XSLT).

In fact, it's closer to what SQL people know as a join.

It's making a stream of n-tuples, and then evaluating the inner
expression for each tuple, so that

for $a in (  'a', 'b', 'c')
  for $b in (1 to 5)
    return $a || '-' || $b

produces 15 lines of output,
a-1, a-2, 1-3, a-4, a-6, b-1, and so on.

You can see the BaseX query plan for your query already moves your
where clauses as i did by hand, because BaseX is awesome.

To make the query fast, you either need to reduce the number of tuples,
and henve the number of times the expressions are evaluated, or you
need to reduce the cost of creating the tuples.

Moving the where clauses was my attempt to reduce the number of tuples.
Adding an index might reduce the cost of making the tuples, so i'd
certainly try that.

If the input document is sorted, you might be able to construct
something recursively (e.g. with fold-left) or use grouping or
windowing to process $parties in groups, which may help considerably.

Without seeing the data, that's only a guess.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org

Reply via email to