The SPARQL-WG is seriously considering making some changes to property
paths for the arbitrary length operators * (zero or more), + (one of
more). The results of some queries may change.
There is no formal decision yet so it is not definite it will happen.
The changes affect cardinality. Currently, * and + match all possible
so if there are multiple ways to get from A to B, there will be one rows
in the each possible path.
Where paths included common elements, duplicates occur. In highly
connected graphs, it can be a lot of duplicates. For example in a
clique of 6 nodes, :p* has 326 solutions, while it has 6 if unique. It
gets worse for larger N.
(A clique is a graph in which every node is connection to every other -
it's the most extreme for of highly connected).
In a FOAF graph, you usually want to know if A and B are connection, not
how many times (and if you did want that you'd probably want the length
as well and SPARQL 1.1 doesn't give you that).
rdfs:subclassOf* is example:
{ :thing rdf:type/rdfs:subclassOf* ?class } is the class and all
superclasses of :thing. An RDFS schema isn't a tree - it can have
acyclic shapes in it as well (directed cycles really would not make
sense!). The app probably wants the classes once.
But sometimes duplicates do matter.
See for example:
http://people.apache.org/~andy/property-paths.html
which is adding up a purchase order to get the total cost. Just because
items have the same price (same literal or same structured node in the
graph) doesn't mean they can be considered the same.
The current plan is to have two sets of operators, one counting
(duplicates), one distinct matches.
There are:
counting: {*} and {+}
distinct: * and +
as being the common usage on way round and anyway {...} can generate
duplicates in other forms.
But that is a change in semantics to * and + from the current SPARQL 1.1
where * and + are counting.
For some (many?) uses of * and + this makes no difference. For example,
accessing lists:
{ ?list rdf:rest*/rdf:first ?member }
Also, SPARQL 1.1 would have a general path operator DISTINCT(..path..)
to turn duplicates into distinct results for that path segment.
path* == DISTINCT(path{*})
path+ == DISTINCT(path{+})
If you have any comments, please make them soon.
The development ARQ (2.9.1-SNAPSHOT) will have these changes in it very
soon.
Andy