On 05/12/11 13:58, Mariano Rodriguez wrote:
In this case of this first initial round of benchmarks we want to avoid any 
Hadoop or
map-reduce approaches. The reason is that
we want to have raw numbers of the core reasoning techniques, in this case 
forward chaining
vs. backward chaining and our technique called semantic indexes which is a bit 
like backward
chaining but with a tiny bit of extra work at loading time. We want to avoid 
evaluating
benefits from the architecture of the system (map-reduce for example) because 
the technique that we are
testing can also be extended with map-reduce and a parallel architecture.

In the past, I've experimented with forward-chaining the schema and doing one step of backward chaining in the query.

Merely forward chaining everything (even just the useful subclass, subproperty, domain and range as is done by riotcmd.infer) causes triple bloat and, at scale, the bloat can reduce the effectiveness of disk caching.

But pure backward chaining has a horrible access pattern on the data (walking arbitrary length paths):

?x rdf:type/rdfs:subClassOf* :type

?x ?p ?v . ?p rdfs:subPropertyOf* :property

(obviously you don't have to do it this way - this is just the naive way and it can be written in SPARQL 1.1 - it's even in the spec).

Assuming the schema is small compared to the data and fixed, preprocessing the schema to have a single table of (type, supertype) with the transitive closure turns it into two patterns:

?x rdf:type ?var . table(?var, :type)

LUBM is unusual in several ways. All systems I know of, load faster on LUBM than any other benchmark because it has a low node to triple ratio (i.e. it is very interconnected within each university). RDFS-level iInference increase this effect because inference can add triples but not create new RDF terms. Loading nodes means the bytes for the URI or literal need to be stored needing more work.

It would be easy to add this to TDB (the prototyping was for SDB where it's more important due to JDBC-isms) - doing it as part of the more general property tables would be interesting.

TDB scales much better than SDB (load and query).

        Andy

Reply via email to