[MarkLogic Dev General] Data Modelling for "option lists" (Anne Taylor)

Gary Vidal Fri, 03 Mar 2017 11:04:18 -0800

Anne,

There are a few approaches to lists that you can consider.  Often static
lists serve 2 use cases.  The values that are possible vs the values that
exists.
In cases where the values are possible, using static lists can be as simple
as storing each list as separate documents. Then having services that
enumerate between the static list vs the data that has the values. One
caveat to this problem is lists that have alot of updates and reads.
Because the list is a single document, you have lock contention which can
be a performance bottleneck as updates will block reads until the update is
committed. So it is important to consider that these lists are generally
not updated.  If the list has a bit of volatility you may consider to put
individual elements inside the list as one document per instanceand perhaps
wrap in a collection by the entity name, to reduce the lock contention.
But this leads to multiple document updates to refresh so the consideration
is that the list should be a small number to fit into a transaction space.
This will mostly be a factor in the cardinality of the data for get/update
patterns.  Every time the list is read, it will require multiple reads of
documents also.


Another approach is to use the lists only to serve as a means to enumerate
possible values for update to documents and use Range Indexes on documents
to return values during query such as faceted navigation.  This ensures the
values used are presented to search clients vs a list of values that return
no results.

Another nuance on this approach is to used semantic lists such as skos or
controlled vocabularies to define the lists and then also assign documents
to those subject IRIs to link documents to vocabularies.  This approach
allows joining data together, but also allows lists to be enhanced outside
of the documents for a richer search or retrieval experience.  Consider a
static list of states or countries.  In a static list you may want to build
a regional context to each country or link states/cities to their country.
Using a flat list approach and statically assigning documents to those
values, you lose the queryability for those broader definitions.  By using
the triple approach, the values associated at the lowest leafs such as
city->state->country can resolve to regions without changing the data to
reflect those constructs.

Consider the use case as follows:
Define the static list as triples using skos

URI : /lists/countries.xml (collection:country)
<countries>
  <country value="US">
  <sem:triple>
      <sem:subject>urn:countries:US</sem:subject>
      <sem:predicate>rdf:type</sem:predicate>
      <sem:object>urn:countries</sem:object>
  </sem:triple>
  <sem:triple>
      <sem:subject>urn:countries:US</sem:subject>
      <sem:predicate>skos:prefLabel</sem:predicate>
      <sem:object>United States of America</sem:object>
  </sem:triple>
  <sem:triple>
      <sem:subject>urn:countries:US</sem:subject>
      <sem:predicate>skos:altLabel</sem:predicate>
      <sem:object>USA</sem:object>
      <sem:object>United States</sem:object>
      <sem:object>Estados Unidos@sp-sp</sem:object>
  </sem:triple>
  </countries>
   ...
</myList>

URI: /documents/1.xml
<document>
  <properties>
       <country>US</US>
  </properties>
 <sem:triples>
   <sem:triple>
       <sem:subject>urn:documents:1</sem:subject>
       <sem:predicate>property:inCountry</sem:predicate>
       <sem:object>urn:countries:US</sem:object>
   </sem:triple>
 </sem:triples>
</document>

The following doc allows for value based search and adding a semantic
triple allows the relationship to interact with your country ontology

No lets consider the case of our region
URI /ontologies/regions-NorthAmerica.xml

<region>
    <region value="North America">
  <sem:triple>
      <sem:subject>urn:regions:NA</sem:subject>
      <sem:predicate>rdf:type</sem:predicate>
      <sem:object>urn:regions</sem:object>
  </sem:triple>
  <sem:triple>
      <sem:subject>urn:regions:NA</sem:subject>
      <sem:predicate>skos:prefLabel</sem:predicate>
      <sem:object>North America</sem:object>
  </sem:triple>
  <sem:triple>
      <sem:subject>urn:regions:NA</sem:subject>
      <sem:predicate>skos:prefLabel</sem:predicate>
      <sem:object>America Northern</sem:object>
  </sem:triple>
  <sem:triple>
      <sem:subject>urn:regions:NA</sem:subject>
      <sem:predicate>skos:narrower</sem:predicate>
      <sem:object>urn:countries:US</sem:object>
  </sem:triple>
  <sem:triple>
      <sem:subject>urn:regions:NA</sem:subject>
      <sem:predicate>skos:narrower</sem:predicate>
      <sem:object>urn:countries:CA</sem:object>
  </sem:triple>
  <sem:triple>
      <sem:subject>urn:regions:NA</sem:subject>
      <sem:predicate>skos:narrower</sem:predicate>
      <sem:object>urn:countries:MX</sem:object>
  </sem:triple>
</region>

As you can see from examples above we have different ways of assigning and
attaching meanings that can support multi-modal concerns in our content. So
lets take the example through the query phase and see how multi-modal
queries can be used.

Query Documents where Country  = "US"
cts:search(/document,cts:element-value-query(xs:QName("country"),"US"))

But what if the user queries Country  = "United States of America"?

let $values :=
  sem:sparql("
    PREFIX skos: <http://whatevertheskosiriIs>
   SELECT ?country
   WHERE  {
      ?country is-a <urn:countries>
      ?country (skos:prefLabel) ?prefLabel .
      ?s skos:altLabel ?altLabel
    FILTER(?prefLabel = $countryQuery || ?altLabel = $countryQuery )
   }
 ", map:entry("countryQuery","United States of America"))
 ! map:get(.,map:keys("country"))
return

 cts:search(/document,cts:triple-range-query((),(),cts:triple-subject($values))

No lets consider regions as described above.  The countries themselves are
not aware of which region  they are assigned and the documents are only
tagged at the country level (this could even be the state or the
city/municipality)

Query: region:"North America"

let $locations :=
sem:sparql("
    PREFIX skos: <http://whatevertheskosiriIs>
   SELECT ?locations
   WHERE  {
      ?region is-a "urn:regions" .
      ?region (skos:altLabel|skos:prefLabel) ?value FILTER(?value =
$regionQuery)
      ?region skos:narrower* ?locations
   }
 ", map:entry("regionQuery","North America"))
 ! map:get(.,map:keys("location"))
return
   cts:search(/document,cts:triple-range-query((),(),$locations)

The $locations sparql resolves all skos:narrower relationships transitively
which allows any relationship between country in (US,CA,MX) and if those
countries are associated with states and cities which have broader|narrower
relationships to countries those conditions would be added to query also.
So to summarize, the general goal is to identify list of possible values as
well as return documents which represent those values and decouple the
concerns of the list to enhance your queries without modelling all these
relationships in your content.  Using Multi-Modal document and query
structure allows alot of flexibility to building rich applications on
MarkLogic.

TLDR its okay.  Your on the right track and hope this helps

If you are interested in more thoughts on this subject please feel free to
contact me directly.

Gary Vidal

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Data Modelling for "option lists" (Anne Taylor)

Reply via email to