GSoC: Cache tables for SPARQL queries

Andy Seaborne Mon, 15 Apr 2013 06:52:09 -0700

== Property Tables

Property tables are a technique for speeding queries up by additionalways of access the data other than the triple table.


They can be used for:

+ data that is reasonably regular
+ caching partial query evaluations ahead of time
+ efficient inference for subclass/subproperty relationships.

A property table is a table where there is a column denoting a variable

in part of a SPARQL pattern. It may be the subject and one or morecolumns for properties of that subject but theer are oither possibilities.


A row in a property table matches a SPARQL basic graph pattern.

Example:

Suppose a dataset includes information about people, and that eachperson always has first name, last name and formal address form:


A property table might be:

subject URI                 first    last       Formal
                             name     name       name

(http://example/person#afs,  "Fred", "Smith",  "Frederick Smith")

and matches the the SPARQL patttern

{ ?person foaf:familyName ?fName ;
          foaf:givenName  ?gName ;
          ex:formalName   ?formal
}

but it can also be used to efficiently answer both partial occurrencesof that patterns and ones where some terms are fixed:


  [] foaf:familyName ?fName ;
     foaf:givenName  ?gName ;
     ex:formalName   "Frederick Smith" .

This is a simple example of only 3 properties. In the real world, oneresources may have 10's of properties so reducing the number ofdatabases accesses may be significant and improve caching.

The basic pattern matched doesn't have to be "same subject" - it mightbe a complex query: such as:


    SELECT (count(*) AS ?c) { ?s ?p ?o } GROUP BY ?s

with a table of (?s, ?c)

"property table" is just conventional name for this approach because thefirst systems here were just basic graph patterns for RDQL.

A query compiler could spot parts of a query pattern to access aprecomputed additional table instead of accessing the conventionaltriple table many times.


This project would apply this idea to Jena TDB.

It involves:
1/ spotting a query patterns
2/ building the property table
3/ maintain the table as data changes

A focus on primarily read-only data for publication means that (3) canbe a process that runs in the background at regular intervals.

GSoC: Cache tables for SPARQL queries

Reply via email to