Hi,
I just walked through the papers for the upcoming ISWC conference and
found a paper about benchmarking of SPARQL property paths [1] .
Not sure if this is relevant, but it looks like Jena has some issues
with different types of queries using the property path. For example,
SELECT ?o WHERE {A B* ?o.} LIMIT 100
lead to an OOM error on non-cyclic data. Here is the relevant part of
the paper:
> While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
> exceptions have occurred. During the benchmark process of Jena an
> OutOfMemoryError has been thrown whenever a query with the * operator
> was used. In order to identify the cause of the error, the amount of
> results the query should return has been limited to 100. The results
> that have been returned by a query of the form SELECT ?o WHERE {A B*
> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
> Due to this fact it is presumable that the query containing the *
> operator returns A recursively until the main memory was full. To
> ensure that this behaviour is not caused by cycles in the dataset a
> query of the same form but with a predicate IRI that did not exist in
> the dataset was executed. This query still returned 100 times A. This
> indicates, that the * operator is not implemented correctly.
In addition, the experiments showed that:
> Due to the problems with the * operator the queries 4, 7 and 8 could
> not be processed. Additionally query 3, 5, and 6 returned no results
> after 1 hour and thus, were aborted. Query 1 returned an empty and
> thus, incomplete result set. Only for query 2 a valid result was
> returned. Due to the lack of comparable results, Jena has been omitted
> in the comparison of triple stores.
In the discussion section, they summarize the overall performance of Jena by
> Jena could not return results for any query in under 1 hour besides
> query 2. Furthermore, the * operator could not be evaluated at all and
> the inverse operator returned empty result sets.
It looks like they used version 3.0.1, so maybe this doesn't hold
anymore for all of the queries. If not, it could be interesting to
improve performance and/or completeness.
I hope I didn't miss some open JIRA ticket, but in general I just wanted
to highlight the presence of some published benchmark for those kind of
queries.
Cheers,
Lorenz
[1] http://ceur-ws.org/Vol-1932/paper-04.pdf