Re: Property Paths benchmark @ ISWC2017

Marco Neumann Thu, 19 Oct 2017 02:08:28 -0700

did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab
to get a response?


the findings seem to based on work that has been published online as
part of a bachelor’s thesis by Adrian Skubella.

https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf



On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B. <[email protected]> wrote:
> For me this is really bad practice. It also looks like they did the
> benchmark more than one year ago. Otherwise due to JENA-1195 this error
> wouldn't occur anymore. And submission deadline was August 6th, 2017 .
> Their experiments contain 8 queries, rerunning those shouldn't take ages...
>
> I'm currently trying to reproduce the results of the paper, but the
> whole experimental setup remains unclear. I'm wondering if they used
> just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled because
> the runtimes in the eval section are quite small, but even loading the
> data of their benchmark takes much more time. So maybe they used the
> RDF4J server.
>
> The worst thing is that they didn't contact any of the developers. Or
> did they talk to somebody here and then Andy created the ticket
> JENA-1195? Also for the other queries that failed, I would expect to see
> tickets on Apache JIRA or at least a hint on the Jena mailing list...
>
> @Andy I'm also wondering whether JENA-1317 addresses the problem with
> the empty result of benchmark query containing an inverse property path.
>
>
> On 18.10.2017 17:03, [email protected] wrote:
>> As you know, Andy, I'm going to ISWC this year-- shall I buttonhole
>> them and give them our POV? :grin:
>>
>> In all seriousness, from what I can tell the results amount to "Using
>> older versions of our comparands and without contacting the projects
>> in question we couldn't find a store that implements every property
>> path feature correctly and some fail entirely."
>>
>> I'm not really sure how useful that information is...? But I am ready
>> to do a benchmarking paper for next year. Seems like it's a lot easier
>> than I thought!
>>
>>
>> ajs6f
>>
>>
>> Andy Seaborne wrote on 10/17/17 9:28 AM:
>>> Hi Lorenz,
>>>
>>> Looks like JENA-1195 which is fixed.  Does that look like it?
>>>
>>> I think it is shame when papers focus on bugs rather than discussing
>>> and even fixing them.  Bugs aren't research.
>>>
>>> Path evaluation could improved to stream in more cases (that's why
>>> LIMIT didn't help), but 1195 explains the slowness
>>> and memory.
>>>
>>>     Andy
>>>
>>> On 17/10/17 07:58, Lorenz B. wrote:
>>>> Hi,
>>>>
>>>> I just walked through the papers for the upcoming ISWC conference and
>>>> found a paper about benchmarking of SPARQL property paths [1] .
>>>>
>>>> Not sure if this is relevant, but it looks like Jena has some issues
>>>> with different types of queries using the property path. For example,
>>>>
>>>> SELECT ?o WHERE {A B* ?o.} LIMIT 100
>>>>
>>>> lead to an OOM error on non-cyclic data. Here is the relevant part of
>>>> the paper:
>>>>
>>>>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
>>>>> exceptions have occurred. During the benchmark process of Jena an
>>>>> OutOfMemoryError has been thrown whenever a query with the * operator
>>>>> was used. In order to identify the cause of the error, the amount of
>>>>> results the query should return has been limited to 100. The results
>>>>> that have been returned by a query of the form SELECT ?o WHERE {A B*
>>>>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
>>>>> Due to this fact it is presumable that the query containing the *
>>>>> operator returns A recursively until the main memory was full. To
>>>>> ensure that this behaviour is not caused by cycles in the dataset a
>>>>> query of the same form but with a predicate IRI that did not exist in
>>>>> the dataset was executed. This query still returned 100 times A. This
>>>>> indicates, that the * operator is not implemented correctly.
>>>> In addition, the experiments showed that:
>>>>> Due to the problems with the * operator the queries 4, 7 and 8 could
>>>>> not be processed. Additionally query 3, 5, and 6 returned no results
>>>>> after 1 hour and thus, were aborted. Query 1 returned an empty and
>>>>> thus, incomplete result set. Only for query 2 a valid result was
>>>>> returned. Due to the lack of comparable results, Jena has been omitted
>>>>> in the comparison of triple stores.
>>>>
>>>> In the discussion section, they summarize the overall performance of
>>>> Jena by
>>>>
>>>>> Jena could not return results for any query in under 1 hour besides
>>>>> query 2. Furthermore, the * operator could not be evaluated at all and
>>>>> the inverse operator returned empty result sets.
>>>>
>>>> It looks like they used version 3.0.1, so maybe this doesn't hold
>>>> anymore for all of the queries. If not, it could be interesting to
>>>> improve performance and/or completeness.
>>>>
>>>> I hope I didn't miss some open JIRA ticket, but in general I just
>>>> wanted
>>>> to highlight the presence of some published benchmark for those kind of
>>>> queries.
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Lorenz
>>>>
>>>> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
>>>>
>



-- 


---
Marco Neumann
KONA

Re: Property Paths benchmark @ ISWC2017

Reply via email to