Hi Eliot,

I (am sorry to) agree there is no straightforward solution to speed up
the lookup of single tokens in attributes. XQuery 3.1 provides a new
string function "contains-token" [1]...

  //*[contains-token(@class, 'topic/topic')]

...but (up to now) it is not index-driven in BaseX.

Some users would love to see us extend our full-text index to
attributes. This way, queries your could be sped as follows:

  //*[@class contains text 'topic/topic'][contains-token(@class, 'topic/topic')]

The second predicate is still required, as the full-text query would
also potentially yield hits like "topic topic" or "ToPiC-!-tOpIc".

Currently, an efficient and (if you get used to it) rather simple way
out is to create your own index...

  let $index := <index>{
    for $element in db:open('db')//*[@class]
    let $id := db:node-id($element)
    for $token in $element/@class/tokenize(., '\s+')
    return <class token="{ $token }">{ $id }</class>
  }</index>
  return db:create('index', $index, 'index.xml')

...and access it in the next step:

  for $id in db:open('index')//class[@token = 'topic/topic']
  return db:open-id('db', $id)

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/XQuery_3.1#fn:contains-token



On Mon, Apr 13, 2015 at 7:38 PM, Eliot Kimber <ekim...@contrext.com> wrote:
> DITA defines the notion of layered hierarchy of element types, where every
> DITA-defined element is either a base type or a "specialized" type derived
> from some base type. The type hierarchy of each element is specified by a
> @class attribute that lists the ancestry and leaf type of the element.
>
> For example, the element type "concept" is a specialization of the base
> type "topic" and so has a @class value of "- topic/topic concept/concept
> ". Each blank-delimited term is a module name/element name pair.
>
> Processing in DITA is "specialization aware" if selection of elements is
> in terms of a @class token rather than concrete element type. For example,
> you might apply processing to topics of any type by matching on
> "*[contains(@class, ' topic/topic ')]", which will match all DITA topics,
> regardless of their specialized type.
>
> The challenge this presents in a database context is optimizing finding of
> things based on these @class values. For large repositories an XQuery like
> "//*[contains(@class, ' topic/topic ')]" is going to be quite slow as it
> requires a string comparison of every @class value. Even if there is an
> attribute value index it will still be slow.
>
> The obvious solution would be to index by @class token, e.g., an index
> where keys are "topic/topic", "topic/p", etc.
>
> Is there a way to construct such an index in BaseX? Is there a better to
> address type of string-match-based lookup?
>
> Thanks,
>
> Eliot
>
> —————
> Eliot Kimber, Owner
> Contrext, LLC
> http://contrext.com
>
>
>

Reply via email to