As you know, Andy, I'm going to ISWC this year-- shall I buttonhole them and 
give them our POV? :grin:

In all seriousness, from what I can tell the results amount to "Using older versions of our comparands and without contacting the projects in question we couldn't find a store that implements every property path feature correctly and some fail entirely."

I'm not really sure how useful that information is...? But I am ready to do a benchmarking paper for next year. Seems like it's a lot easier than I thought!


ajs6f


Andy Seaborne wrote on 10/17/17 9:28 AM:
Hi Lorenz,

Looks like JENA-1195 which is fixed.  Does that look like it?

I think it is shame when papers focus on bugs rather than discussing and even 
fixing them.  Bugs aren't research.

Path evaluation could improved to stream in more cases (that's why LIMIT didn't 
help), but 1195 explains the slowness
and memory.

    Andy

On 17/10/17 07:58, Lorenz B. wrote:
Hi,

I just walked through the papers for the upcoming ISWC conference and
found a paper about benchmarking of SPARQL property paths [1] .

Not sure if this is relevant, but it looks like Jena has some issues
with different types of queries using the property path. For example,

SELECT ?o WHERE {A B* ?o.} LIMIT 100

lead to an OOM error on non-cyclic data. Here is the relevant part of
the paper:

While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
exceptions have occurred. During the benchmark process of Jena an
OutOfMemoryError has been thrown whenever a query with the * operator
was used. In order to identify the cause of the error, the amount of
results the query should return has been limited to 100. The results
that have been returned by a query of the form SELECT ?o WHERE {A B*
?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
Due to this fact it is presumable that the query containing the *
operator returns A recursively until the main memory was full. To
ensure that this behaviour is not caused by cycles in the dataset a
query of the same form but with a predicate IRI that did not exist in
the dataset was executed. This query still returned 100 times A. This
indicates, that the * operator is not implemented correctly.
In addition, the experiments showed that:
Due to the problems with the * operator the queries 4, 7 and 8 could
not be processed. Additionally query 3, 5, and 6 returned no results
after 1 hour and thus, were aborted. Query 1 returned an empty and
thus, incomplete result set. Only for query 2 a valid result was
returned. Due to the lack of comparable results, Jena has been omitted
in the comparison of triple stores.

In the discussion section, they summarize the overall performance of Jena by

Jena could not return results for any query in under 1 hour besides
query 2. Furthermore, the * operator could not be evaluated at all and
the inverse operator returned empty result sets.

It looks like they used version 3.0.1, so maybe this doesn't hold
anymore for all of the queries. If not, it could be interesting to
improve performance and/or completeness.

I hope I didn't miss some open JIRA ticket, but in general I just wanted
to highlight the presence of some published benchmark for those kind of
queries.


Cheers,

Lorenz

[1] http://ceur-ws.org/Vol-1932/paper-04.pdf

Reply via email to