Re: [basex-talk] Constructing "resolved" DITA Map in XQuery: How to Avoid High Memory usage?

Eliot Kimber Wed, 05 Apr 2023 07:25:39 -0700

I don’t think it’s multiple inclusion of the same resource, although that is a 
possibility with our content (although one I’ve worked hard to eliminate in the 
latest updates to our content set).


At least based on logging shown in the GUI, the failure happens long before the 
possibility of encountering a possibly multiply-included submap, for example ( 
which would be the case that might result in a loop).

To answer Liam’s question about just storing the resolved map, the challeng is 
that the resolved map needs to reflect the node IDs of the original elements, 
so pregenerating it won’t work without some way to then correlate the resolved 
map elements to their corresponding elements in the original source.

As I’ve thought about it more I think a process that walks the map tree and 
constructs XQuery maps is the best solution.

Cheers,

E.

_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | 
Twitter<https://twitter.com/servicenow> | 
YouTube<https://www.youtube.com/user/servicenowinc> | 
Facebook<https://www.facebook.com/servicenow>

From: Hans-Juergen Rennau <hren...@yahoo.de>
Date: Wednesday, April 5, 2023 at 2:30 AM
To: basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.de>, 
Eliot Kimber <eliot.kim...@servicenow.com>
Subject: Re: [basex-talk] Constructing "resolved" DITA Map in XQuery: How to 
Avoid High Memory usage?
[External Email]

________________________________
Greetings, Eliot,

could it be that the problem arises from repeated inclusion of one and the same 
resource, which is referenced by different resources? You might check this by 
determining the cumulative size of the resources to be potentially included. Is 
it really >1 GB?

Even if you use a recursive function receiving as a parameter the resources 
already processed and suppress the processing of a resource found among them, 
like so

declare function f:resolve($node, $alreadyFound) {
    if ($node intersect $alreadyFound) then () else
    ...
    ... f:resolve($child, ($node, alreadyFound))
    ...
}

this does prevent circular inclusion, but may not be sufficient to prevent a 
combinatorial explosion. The explosion may occur if you process siblings in a 
straightforward way, so that the result of resolving one element is not fed 
into the processing of the following siblings, like so:

declare function f:resolve($node, $alreadyFound) {
    ...
    ... $node/*/f:resolve($child, ($node, alreadyFound))
    ...
}

To avoid combinatorialexplosion I suggest a method which I call "total 
recursion", in which each invocation of the recursive function processes only 
one node, traversing siblings recursively. (If relevant, details on demand.)

Kind regards,
Hans-Jürgen

Am Mittwoch, 5. April 2023 um 01:13:10 MESZ hat Eliot Kimber 
<eliot.kim...@servicenow.com> Folgendes geschrieben:



I’m implementing a feature of DITA which involves pulling together all the DITA 
maps and submaps linked from a root map so that you can then process them as a 
single unit in order to then construct “key spaces”, which are defined by the 
topicrefs contained in the maps and which depend on both the structural 
hierarchy defined by the tree of maps and submaps and on the markup details of 
both the maps and the topicref elements. It’s a challenging bit of data 
processing.



In other contexts where I’ve implemented this processing I start by creating a 
“resolved map” using a relatively simple transform, resulting in a single XML 
document with all the stuff needed to then construct the DITA key space. With 
the resolved map, the logic to construct the key space is a relatively simple 
three-phase process.



My naïve attempt to do this in BaseX using the normal typeswitch approach to 
implement an identity transform worked at a small scale, but for our real DITA 
maps, which have 10s of 1000s of elements, the process quickly exhausts the 2GB 
of RAM allocated to BaseX GUI.



The reason I’m doing the transform in XQuery and not just using Saxon via the 
XSLT module is because I need to annotate the resulting resolved map with the 
database node IDs of each element so I can then capture those details in the 
final key space, which I’m storing as XML in another database—the constructed 
key space acts as an index where the input is a context element/key name pair 
and the result is the topicref element that defines the key, from which I can 
then get the resource associated with that topicref (i.e., the topic it 
references or a string it defines or whatever it might be).



So my first question is: Is there a general technique for doing this kind of 
identity transform that won’t blow up the memory? I suspect the answer is “no” 
but figured I’d ask.



Or is it possible to apply a Saxon transform to content pulled from the BaseX 
database and have access to the node IDs? I didn’t immediately see a way that 
you could do that.  There must be a pretty sharp separation between BaseX and 
Saxon here, but again, maybe I missed something?



If the answer is “no” I can work out a more sophisticated way to build the 
initial data from which the key space is ultimately constructed by walking the 
map tree and populating XQuery maps or something, but I was hoping to keep my 
simple code that just operates on the resolved map.



Thanks,



Eliot

_____________________________________________

Eliot Kimber

Sr Staff Content Engineer

O: 512 554 9368

M: 512 554 9368

servicenow.com<https://www.servicenow.com>

LinkedIn<https://www.linkedin.com/company/servicenow> | 
Twitter<https://twitter.com/servicenow> | 
YouTube<https://www.youtube.com/user/servicenowinc> | 
Facebook<https://www.facebook.com/servicenow>

Re: [basex-talk] Constructing "resolved" DITA Map in XQuery: How to Avoid High Memory usage?

Reply via email to