Re: [basex-talk] improving query performance

Bill Osmond Sat, 22 Aug 2020 07:33:01 -0700

Great e-mail messages to wake up to! Thank you for the further explanation
Liam, and Christian the examples you provided were considerably faster:


- my fastest was 70k ms
- your ddex.xq was 35kms
- your ddex2.xq was 10kms!

There is only one issue: both ddex.xq and ddex2.xq seem to return many more
results than expected (cartesian product somewhere perhaps)

When I run the queries against a smaller database - one with just 6 of the
DDEX documents, my query returns 70 results which matches the number of
TrackReleases, but both ddex.xq and ddex2.xq return 303,134 results. It
looks like a separate "copy" of the output is being created for every Party
in the PartyList, when really there should be only one (specified by the
PartyReference). But this is very promising - if it takes 10 seconds to
return a massively expanded version of the data, then perhaps this will get
to <1000ms!

On Sat, Aug 22, 2020 at 4:07 AM Christian Grün <christian.gr...@gmail.com>
wrote:

> Hi Bill,
>
> Feel free to run the attached queries; maybe they give you a faster result.
>
> Your use case was interesting. It gave me some additional ideas on how
> to speed up queries (by reordering consecutive 'for' clauses that do
> not change the result).
>
> Cheers,
> Christian
>
>
> On Sat, Aug 22, 2020 at 6:10 AM Liam R. E. Quin <l...@fromoldbooks.org>
> wrote:
> >
> > On Fri, 2020-08-21 at 17:28 -0700, Bill Osmond wrote:
> > > I'm beginning to think that perhaps my performance hopes were a bit
> > > too
> > > inflated, given the size and complexity of our database. After a
> > > fresh
> > > optimization, and with -Xms2g -Xmx10g, the following query takes
> > > 1492ms:
> >
> > [...]
> >
> > First note - there are in fact no loops in your query. Although "for"
> > is used to introduce a loop in many procedural languages, it does nto
> > do so in XQuery (nor does for-each in XSLT).
> >
> > In fact, it's closer to what SQL people know as a join.
> >
> > It's making a stream of n-tuples, and then evaluating the inner
> > expression for each tuple, so that
> >
> > for $a in (  'a', 'b', 'c')
> >   for $b in (1 to 5)
> >     return $a || '-' || $b
> >
> > produces 15 lines of output,
> > a-1, a-2, 1-3, a-4, a-6, b-1, and so on.
> >
> > You can see the BaseX query plan for your query already moves your
> > where clauses as i did by hand, because BaseX is awesome.
> >
> > To make the query fast, you either need to reduce the number of tuples,
> > and henve the number of times the expressions are evaluated, or you
> > need to reduce the cost of creating the tuples.
> >
> > Moving the where clauses was my attempt to reduce the number of tuples.
> > Adding an index might reduce the cost of making the tuples, so i'd
> > certainly try that.
> >
> > If the input document is sorted, you might be able to construct
> > something recursively (e.g. with fold-left) or use grouping or
> > windowing to process $parties in groups, which may help considerably.
> >
> > Without seeing the data, that's only a guess.
> >
> > Liam
> >
> > --
> > Liam Quin, https://www.delightfulcomputing.com/
> > Available for XML/Document/Information Architecture/XSLT/
> > XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
> > Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org
> >
>

Re: [basex-talk] improving query performance

Reply via email to