Great e-mail messages to wake up to! Thank you for the further explanation Liam, and Christian the examples you provided were considerably faster:
- my fastest was 70k ms - your ddex.xq was 35kms - your ddex2.xq was 10kms! There is only one issue: both ddex.xq and ddex2.xq seem to return many more results than expected (cartesian product somewhere perhaps) When I run the queries against a smaller database - one with just 6 of the DDEX documents, my query returns 70 results which matches the number of TrackReleases, but both ddex.xq and ddex2.xq return 303,134 results. It looks like a separate "copy" of the output is being created for every Party in the PartyList, when really there should be only one (specified by the PartyReference). But this is very promising - if it takes 10 seconds to return a massively expanded version of the data, then perhaps this will get to <1000ms! On Sat, Aug 22, 2020 at 4:07 AM Christian Grün <christian.gr...@gmail.com> wrote: > Hi Bill, > > Feel free to run the attached queries; maybe they give you a faster result. > > Your use case was interesting. It gave me some additional ideas on how > to speed up queries (by reordering consecutive 'for' clauses that do > not change the result). > > Cheers, > Christian > > > On Sat, Aug 22, 2020 at 6:10 AM Liam R. E. Quin <l...@fromoldbooks.org> > wrote: > > > > On Fri, 2020-08-21 at 17:28 -0700, Bill Osmond wrote: > > > I'm beginning to think that perhaps my performance hopes were a bit > > > too > > > inflated, given the size and complexity of our database. After a > > > fresh > > > optimization, and with -Xms2g -Xmx10g, the following query takes > > > 1492ms: > > > > [...] > > > > First note - there are in fact no loops in your query. Although "for" > > is used to introduce a loop in many procedural languages, it does nto > > do so in XQuery (nor does for-each in XSLT). > > > > In fact, it's closer to what SQL people know as a join. > > > > It's making a stream of n-tuples, and then evaluating the inner > > expression for each tuple, so that > > > > for $a in ( 'a', 'b', 'c') > > for $b in (1 to 5) > > return $a || '-' || $b > > > > produces 15 lines of output, > > a-1, a-2, 1-3, a-4, a-6, b-1, and so on. > > > > You can see the BaseX query plan for your query already moves your > > where clauses as i did by hand, because BaseX is awesome. > > > > To make the query fast, you either need to reduce the number of tuples, > > and henve the number of times the expressions are evaluated, or you > > need to reduce the cost of creating the tuples. > > > > Moving the where clauses was my attempt to reduce the number of tuples. > > Adding an index might reduce the cost of making the tuples, so i'd > > certainly try that. > > > > If the input document is sorted, you might be able to construct > > something recursively (e.g. with fold-left) or use grouping or > > windowing to process $parties in groups, which may help considerably. > > > > Without seeing the data, that's only a guess. > > > > Liam > > > > -- > > Liam Quin, https://www.delightfulcomputing.com/ > > Available for XML/Document/Information Architecture/XSLT/ > > XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. > > Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org > > >