Hi Eliot, I (am sorry to) agree there is no straightforward solution to speed up the lookup of single tokens in attributes. XQuery 3.1 provides a new string function "contains-token" [1]...
//*[contains-token(@class, 'topic/topic')] ...but (up to now) it is not index-driven in BaseX. Some users would love to see us extend our full-text index to attributes. This way, queries your could be sped as follows: //*[@class contains text 'topic/topic'][contains-token(@class, 'topic/topic')] The second predicate is still required, as the full-text query would also potentially yield hits like "topic topic" or "ToPiC-!-tOpIc". Currently, an efficient and (if you get used to it) rather simple way out is to create your own index... let $index := <index>{ for $element in db:open('db')//*[@class] let $id := db:node-id($element) for $token in $element/@class/tokenize(., '\s+') return <class token="{ $token }">{ $id }</class> }</index> return db:create('index', $index, 'index.xml') ...and access it in the next step: for $id in db:open('index')//class[@token = 'topic/topic'] return db:open-id('db', $id) Hope this helps, Christian [1] http://docs.basex.org/wiki/XQuery_3.1#fn:contains-token On Mon, Apr 13, 2015 at 7:38 PM, Eliot Kimber <ekim...@contrext.com> wrote: > DITA defines the notion of layered hierarchy of element types, where every > DITA-defined element is either a base type or a "specialized" type derived > from some base type. The type hierarchy of each element is specified by a > @class attribute that lists the ancestry and leaf type of the element. > > For example, the element type "concept" is a specialization of the base > type "topic" and so has a @class value of "- topic/topic concept/concept > ". Each blank-delimited term is a module name/element name pair. > > Processing in DITA is "specialization aware" if selection of elements is > in terms of a @class token rather than concrete element type. For example, > you might apply processing to topics of any type by matching on > "*[contains(@class, ' topic/topic ')]", which will match all DITA topics, > regardless of their specialized type. > > The challenge this presents in a database context is optimizing finding of > things based on these @class values. For large repositories an XQuery like > "//*[contains(@class, ' topic/topic ')]" is going to be quite slow as it > requires a string comparison of every @class value. Even if there is an > attribute value index it will still be slow. > > The obvious solution would be to index by @class token, e.g., an index > where keys are "topic/topic", "topic/p", etc. > > Is there a way to construct such an index in BaseX? Is there a better to > address type of string-match-based lookup? > > Thanks, > > Eliot > > ————— > Eliot Kimber, Owner > Contrext, LLC > http://contrext.com > > >