did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab to get a response?
the findings seem to based on work that has been published online as part of a bachelor’s thesis by Adrian Skubella. https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B. <[email protected]> wrote: > For me this is really bad practice. It also looks like they did the > benchmark more than one year ago. Otherwise due to JENA-1195 this error > wouldn't occur anymore. And submission deadline was August 6th, 2017 . > Their experiments contain 8 queries, rerunning those shouldn't take ages... > > I'm currently trying to reproduce the results of the paper, but the > whole experimental setup remains unclear. I'm wondering if they used > just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled because > the runtimes in the eval section are quite small, but even loading the > data of their benchmark takes much more time. So maybe they used the > RDF4J server. > > The worst thing is that they didn't contact any of the developers. Or > did they talk to somebody here and then Andy created the ticket > JENA-1195? Also for the other queries that failed, I would expect to see > tickets on Apache JIRA or at least a hint on the Jena mailing list... > > @Andy I'm also wondering whether JENA-1317 addresses the problem with > the empty result of benchmark query containing an inverse property path. > > > On 18.10.2017 17:03, [email protected] wrote: >> As you know, Andy, I'm going to ISWC this year-- shall I buttonhole >> them and give them our POV? :grin: >> >> In all seriousness, from what I can tell the results amount to "Using >> older versions of our comparands and without contacting the projects >> in question we couldn't find a store that implements every property >> path feature correctly and some fail entirely." >> >> I'm not really sure how useful that information is...? But I am ready >> to do a benchmarking paper for next year. Seems like it's a lot easier >> than I thought! >> >> >> ajs6f >> >> >> Andy Seaborne wrote on 10/17/17 9:28 AM: >>> Hi Lorenz, >>> >>> Looks like JENA-1195 which is fixed. Does that look like it? >>> >>> I think it is shame when papers focus on bugs rather than discussing >>> and even fixing them. Bugs aren't research. >>> >>> Path evaluation could improved to stream in more cases (that's why >>> LIMIT didn't help), but 1195 explains the slowness >>> and memory. >>> >>> Andy >>> >>> On 17/10/17 07:58, Lorenz B. wrote: >>>> Hi, >>>> >>>> I just walked through the papers for the upcoming ISWC conference and >>>> found a paper about benchmarking of SPARQL property paths [1] . >>>> >>>> Not sure if this is relevant, but it looks like Jena has some issues >>>> with different types of queries using the property path. For example, >>>> >>>> SELECT ?o WHERE {A B* ?o.} LIMIT 100 >>>> >>>> lead to an OOM error on non-cyclic data. Here is the relevant part of >>>> the paper: >>>> >>>>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors or >>>>> exceptions have occurred. During the benchmark process of Jena an >>>>> OutOfMemoryError has been thrown whenever a query with the * operator >>>>> was used. In order to identify the cause of the error, the amount of >>>>> results the query should return has been limited to 100. The results >>>>> that have been returned by a query of the form SELECT ?o WHERE {A B* >>>>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A. >>>>> Due to this fact it is presumable that the query containing the * >>>>> operator returns A recursively until the main memory was full. To >>>>> ensure that this behaviour is not caused by cycles in the dataset a >>>>> query of the same form but with a predicate IRI that did not exist in >>>>> the dataset was executed. This query still returned 100 times A. This >>>>> indicates, that the * operator is not implemented correctly. >>>> In addition, the experiments showed that: >>>>> Due to the problems with the * operator the queries 4, 7 and 8 could >>>>> not be processed. Additionally query 3, 5, and 6 returned no results >>>>> after 1 hour and thus, were aborted. Query 1 returned an empty and >>>>> thus, incomplete result set. Only for query 2 a valid result was >>>>> returned. Due to the lack of comparable results, Jena has been omitted >>>>> in the comparison of triple stores. >>>> >>>> In the discussion section, they summarize the overall performance of >>>> Jena by >>>> >>>>> Jena could not return results for any query in under 1 hour besides >>>>> query 2. Furthermore, the * operator could not be evaluated at all and >>>>> the inverse operator returned empty result sets. >>>> >>>> It looks like they used version 3.0.1, so maybe this doesn't hold >>>> anymore for all of the queries. If not, it could be interesting to >>>> improve performance and/or completeness. >>>> >>>> I hope I didn't miss some open JIRA ticket, but in general I just >>>> wanted >>>> to highlight the presence of some published benchmark for those kind of >>>> queries. >>>> >>>> >>>> Cheers, >>>> >>>> Lorenz >>>> >>>> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf >>>> > -- --- Marco Neumann KONA
