In my Mirabel project I’m loading large quantities of DITA content (40K topics 
per content database).

I’m turning off DTD parsing because otherwise it takes hours to load the 
content (I haven’t had bandwidth to figure out how to use Xerces grammar 
caching with BaseX, which would solve the problem).

DITA uses an attribute-based mechanism (@class) to associate elements with 
their base types. For example, a <concept> element is a kind of <topic>, as 
indicated by the @class value “- topic/topic concept/concept “.

To find DITA elements you always search on @class values rather than tag names. 
In XSLT 3 this is best expressed as match=”*[contains-token(@class, 
‘topic/topic’)]” to handle any topic element.

These attributes are normally defaulted in grammars so they are not explicit in 
the source as imported and thus do not exist in my no-DTD content databases.

To work around this I’ve implemented a function that hard codes the 
element-to-class-value mapping to enable selecting elements by @class value. 
(For any set of content you can know what the @class value is for any element 
type as its statically declared in the governing grammar, so it is possible to 
have this kind of hard-coded element-to-class mapping.)

However, I now realize that if the attributes are in the data as stored in 
BaseX that lookup will be much faster using an attribute index. That is, rather 
than using a predicate like [dutils:class(., ‘topic/topic’)] I could do 
db:token($database, ‘topic/topic’, ‘class’).

Because I have the element-to-class mapping I can add the attributes to the 
content after the initial database load.

My question: How best to do that?

I can have an updating function that simply adds the @class attribute to all 
the elements but I suspect there are some performance implications for doing 
that kind of bulk update to millions of nodes.

So any guidance on this kind of bulk update would be helpful.

Thanks,

Eliot
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | 
Twitter<https://twitter.com/servicenow> | 
YouTube<https://www.youtube.com/user/servicenowinc> | 
Facebook<https://www.facebook.com/servicenow>

Reply via email to