[basex-talk] Re: Optimizing query that reads nodes from two databases

Matt Dziuban Wed, 04 Jun 2025 04:17:54 -0700

Thank you Ron and Christian for the suggestions! I’ll give both a try. Will
the map solution cause BaseX to hold every item from both databases in
memory?


Best,
Matt

On Wed, Jun 4, 2025 at 3:27 AM Christian Grün <[email protected]>
wrote:

> Hi Matt,
>
> Ron has already given you a perfect answer on how to utilize maps. Another
> slightly slower solution is to address the database twice, and let the
> compiler do its job:
>
> let $originalDB := db:get('OriginalDB')/data/item
> let $updateDB := db:get('UpdateDB')/data/item
> for $id in distinct-values(($originalDB/@id, $updateDB/@id))
> let $original := db:get('OriginalDB')/data/item[@id = $id]
> let $update := db:get('UpdateDB')/data/item[@id = $id]
> return json:serialize(map {
>   'id': xs:long($id),
>   'original': $original,
>   'update': $update
> })
>
> This may seem counterintuitive at first glance, but the expression
> "db:get('OriginalDB')/data/item[@id = $id]" is something that the optimizer
> will rewrite for index access. It is equivalent to:
>
> let $original := db:attribute("OriginalDB",
> $id)/self::attribute(id)/parent::item
> let $update := db:attribute("UpdateDB",
> $id)/self::attribute(id)/parent::item
>
> Hope this helps,
> Christian
>
>
>
> On Tue, Jun 3, 2025 at 11:10 PM Matt Dziuban <[email protected]> wrote:
>
>> Hi all,
>>
>> I'm working on a query that reads from two databases -- OriginalDB and
>> UpdateDB. The structure of each database is the same:
>>
>> <data>
>>   <item id="1">...</item>
>>   <item id="2">...</item>
>>   ...
>> </data>
>>
>> OriginalDB has 2,305,570 items and UpdateDB has 307,019.
>>
>> The query gets all distinct item ids from the two databases, then gets
>> the item with a matching id (if there is one) from each database, and
>> returns the id and both nodes in a JSON object. Here's the query:
>>
>> let $originalDB := db:get('OriginalDB')/data/item
>> let $updateDB := db:get('UpdateDB')/data/item
>> for $id in distinct-values(($originalDB/@id, $updateDB/@id))
>> let $original := $originalDB[@id = $id]
>> let $update := $updateDB[@id = $id]
>> return json:serialize(map {
>>   'id': xs:long($id),
>>   'original': $original,
>>   'update': $update
>> }, map { 'indent': 'no' })
>>
>> In its current state this query is very slow -- it's returned only 35
>> JSON objects in ~30 minutes. How can I go about optimizing it to best take
>> advantage of each database's indices?
>>
>> Thanks in advance,
>> Matt
>>
>

[basex-talk] Re: Optimizing query that reads nodes from two databases

Reply via email to