The SPARQL-WG is seriously considering making some changes to property paths for the arbitrary length operators * (zero or more), + (one of more). The results of some queries may change.

There is no formal decision yet so it is not definite it will happen.

The changes affect cardinality. Currently, * and + match all possible so if there are multiple ways to get from A to B, there will be one rows in the each possible path.

Where paths included common elements, duplicates occur. In highly connected graphs, it can be a lot of duplicates. For example in a clique of 6 nodes, :p* has 326 solutions, while it has 6 if unique. It gets worse for larger N.

(A clique is a graph in which every node is connection to every other - it's the most extreme for of highly connected).

In a FOAF graph, you usually want to know if A and B are connection, not how many times (and if you did want that you'd probably want the length as well and SPARQL 1.1 doesn't give you that).

rdfs:subclassOf* is example:

{ :thing rdf:type/rdfs:subclassOf* ?class } is the class and all superclasses of :thing. An RDFS schema isn't a tree - it can have acyclic shapes in it as well (directed cycles really would not make sense!). The app probably wants the classes once.

But sometimes duplicates do matter.

See for example:
http://people.apache.org/~andy/property-paths.html

which is adding up a purchase order to get the total cost. Just because items have the same price (same literal or same structured node in the graph) doesn't mean they can be considered the same.

The current plan is to have two sets of operators, one counting (duplicates), one distinct matches.

There are:
  counting: {*} and {+}
  distinct: * and +

as being the common usage on way round and anyway {...} can generate duplicates in other forms.

But that is a change in semantics to * and + from the current SPARQL 1.1 where * and + are counting.

For some (many?) uses of * and + this makes no difference. For example, accessing lists:

   { ?list rdf:rest*/rdf:first ?member }

Also, SPARQL 1.1 would have a general path operator DISTINCT(..path..) to turn duplicates into distinct results for that path segment.

path* == DISTINCT(path{*})
path+ == DISTINCT(path{+})

If you have any comments, please make them soon.

The development ARQ (2.9.1-SNAPSHOT) will have these changes in it very soon.

        Andy

Reply via email to