Thank you Ron and Christian for the suggestions! I’ll give both a try. Will the map solution cause BaseX to hold every item from both databases in memory?
Best, Matt On Wed, Jun 4, 2025 at 3:27 AM Christian Grün <[email protected]> wrote: > Hi Matt, > > Ron has already given you a perfect answer on how to utilize maps. Another > slightly slower solution is to address the database twice, and let the > compiler do its job: > > let $originalDB := db:get('OriginalDB')/data/item > let $updateDB := db:get('UpdateDB')/data/item > for $id in distinct-values(($originalDB/@id, $updateDB/@id)) > let $original := db:get('OriginalDB')/data/item[@id = $id] > let $update := db:get('UpdateDB')/data/item[@id = $id] > return json:serialize(map { > 'id': xs:long($id), > 'original': $original, > 'update': $update > }) > > This may seem counterintuitive at first glance, but the expression > "db:get('OriginalDB')/data/item[@id = $id]" is something that the optimizer > will rewrite for index access. It is equivalent to: > > let $original := db:attribute("OriginalDB", > $id)/self::attribute(id)/parent::item > let $update := db:attribute("UpdateDB", > $id)/self::attribute(id)/parent::item > > Hope this helps, > Christian > > > > On Tue, Jun 3, 2025 at 11:10 PM Matt Dziuban <[email protected]> wrote: > >> Hi all, >> >> I'm working on a query that reads from two databases -- OriginalDB and >> UpdateDB. The structure of each database is the same: >> >> <data> >> <item id="1">...</item> >> <item id="2">...</item> >> ... >> </data> >> >> OriginalDB has 2,305,570 items and UpdateDB has 307,019. >> >> The query gets all distinct item ids from the two databases, then gets >> the item with a matching id (if there is one) from each database, and >> returns the id and both nodes in a JSON object. Here's the query: >> >> let $originalDB := db:get('OriginalDB')/data/item >> let $updateDB := db:get('UpdateDB')/data/item >> for $id in distinct-values(($originalDB/@id, $updateDB/@id)) >> let $original := $originalDB[@id = $id] >> let $update := $updateDB[@id = $id] >> return json:serialize(map { >> 'id': xs:long($id), >> 'original': $original, >> 'update': $update >> }, map { 'indent': 'no' }) >> >> In its current state this query is very slow -- it's returned only 35 >> JSON objects in ~30 minutes. How can I go about optimizing it to best take >> advantage of each database's indices? >> >> Thanks in advance, >> Matt >> >

