[basex-talk] Optimizing Element Access By Attribute Value Matching

Eliot Kimber Mon, 13 Apr 2015 10:38:51 -0700

DITA defines the notion of layered hierarchy of element types, where every
DITA-defined element is either a base type or a "specialized" type derived
from some base type. The type hierarchy of each element is specified by a
@class attribute that lists the ancestry and leaf type of the element.


For example, the element type "concept" is a specialization of the base
type "topic" and so has a @class value of "- topic/topic concept/concept
". Each blank-delimited term is a module name/element name pair.

Processing in DITA is "specialization aware" if selection of elements is
in terms of a @class token rather than concrete element type. For example,
you might apply processing to topics of any type by matching on
"*[contains(@class, ' topic/topic ')]", which will match all DITA topics,
regardless of their specialized type.

The challenge this presents in a database context is optimizing finding of
things based on these @class values. For large repositories an XQuery like
"//*[contains(@class, ' topic/topic ')]" is going to be quite slow as it
requires a string comparison of every @class value. Even if there is an
attribute value index it will still be slow.

The obvious solution would be to index by @class token, e.g., an index
where keys are "topic/topic", "topic/p", etc.

Is there a way to construct such an index in BaseX? Is there a better to
address type of string-match-based lookup?

Thanks,

Eliot

—————
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com

[basex-talk] Optimizing Element Access By Attribute Value Matching

Reply via email to