Re: Property Paths benchmark @ ISWC2017

Marco Neumann Thu, 19 Oct 2017 09:11:03 -0700

just on a side note since this is "only" a workshop contribution it
will not make an appearance in the conference itself and will not
appear in the main ISWC  2017 conference proceedings published by
Springer but only as an independent publication of the workshop
itself.


responsibility for the workshop sits with the  Organising Committee

Axel-Cyrille Ngonga Ngomo, Institute for Applied Informatics, Leipzig, Germany
Anastasia Krithara, National Center for Scienti c Research
“Demokritos”, Athens, Greece
Irini Fundulaki, ICS-FORTH, Heraklion, Crete, Greece

and for review the Program Committee

Milos Jovanovik, OpenLink Software, United Kingdom
Pavlos Fafalios, University of Hannover. Germany
Kostas Stefanidis, University of Tampere, Finland
Muhammad Saleem, AKSW, University of Leipzig, Germany
Manolis Terrovitis, IMIS, RC Athena, Greece
Ricardo Usbeck, University of Leipzig, Germany
George Papastefanatos, IMIS RC Athena, Greece
Stasinos Kostantopoulos, NCSR Demokritos, Greece




On Thu, Oct 19, 2017 at 3:51 PM,  <[email protected]> wrote:
> I hadn't intended to spend time at the benchmarking sessions at ISWC, but if
> it seems useful, I can try and raise this issue in person. I suppose partly
> it's a question of setting the record straight, and then partly it's a
> question of standing up for good practice, and then it's also a question of
> protecting Jena from unmerited negative consequences.
>
> I don't know how widely used such benchmarks are. Except for a few
> high-profile projects, I rarely see anyone refer to this sort of evidence as
> a reason to or not to adopt a system.
>
>
> ajs6f
>
> Marco Neumann wrote on 10/19/17 9:26 AM:
>
>> Rob,
>>
>> unfortunately this is more common in Semantic Web research papers than
>> one might expect. I have seen this before in particular with regards
>> to perceived shortcomings of jena or its components. It might be a
>> good idea to bring this to the attention of affiliated people in the
>> organisation (here University of Southampton and Koblenz-Landau ).
>>
>> while I don't think this is an intentional attempt to bring Jena into
>> disrepute the situation could be clarified and addressed by the ISWC
>> workshop or track chair as well. I wish your mentioned "standard
>> Industry and research practice" would be more common than it currently
>> is.
>>
>> btw the thesis report is dated Juli 2016
>>
>>
>>
>> On Thu, Oct 19, 2017 at 12:08 PM, Rob Vesse <[email protected]> wrote:
>>>
>>> Marco
>>>
>>> I don’t believe anyone has tried to contact them yet
>>>
>>> I think that the complaints here are that there doesn’t appear to have
>>> been any attempt to report the issues identified back to the projects
>>> studied. If this was a security flaw in the project the standard Industry
>>> and research practice would be to make a responsible disclosure to the
>>> projects in advance of the public disclosure such that the researchers and
>>> projects can work together to resolve the problem. The implication being
>>> that it is irresponsible for the authors to benefit from pointing out flaws
>>> in the projects while appearing to make no efforts to help report/resolve
>>> those issues.
>>>
>>> As you suggest this paper does appear to be based upon some thesis work,
>>> that thesis indicates that the research was originally carried out in 2015
>>> implying that the author knew of the issue two years ago.
>>>
>>> The project has a relatively small core of developers most of whom work
>>> on Jena on the side. We very much rely upon the wider community to provide
>>> input on bugs that need to be resolved e.g. Performance issues and the
>>> features we should prioritise. When someone clearly knew of a problem but
>>> didn’t tell us that is inevitably frustrating for the project.
>>>
>>> Rob
>>>
>>> On 19/10/2017 10:08, "Marco Neumann" <[email protected]> wrote:
>>>
>>>     did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab
>>>     to get a response?
>>>
>>>     the findings seem to based on work that has been published online as
>>>     part of a bachelor’s thesis by Adrian Skubella.
>>>
>>>
>>> https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf
>>>
>>>
>>>
>>>     On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B.
>>> <[email protected]> wrote:
>>>     > For me this is really bad practice. It also looks like they did the
>>>     > benchmark more than one year ago. Otherwise due to JENA-1195 this
>>> error
>>>     > wouldn't occur anymore. And submission deadline was August 6th,
>>> 2017 .
>>>     > Their experiments contain 8 queries, rerunning those shouldn't take
>>> ages...
>>>     >
>>>     > I'm currently trying to reproduce the results of the paper, but the
>>>     > whole experimental setup remains unclear. I'm wondering if they
>>> used
>>>     > just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled
>>> because
>>>     > the runtimes in the eval section are quite small, but even loading
>>> the
>>>     > data of their benchmark takes much more time. So maybe they used
>>> the
>>>     > RDF4J server.
>>>     >
>>>     > The worst thing is that they didn't contact any of the developers.
>>> Or
>>>     > did they talk to somebody here and then Andy created the ticket
>>>     > JENA-1195? Also for the other queries that failed, I would expect
>>> to see
>>>     > tickets on Apache JIRA or at least a hint on the Jena mailing
>>> list...
>>>     >
>>>     > @Andy I'm also wondering whether JENA-1317 addresses the problem
>>> with
>>>     > the empty result of benchmark query containing an inverse property
>>> path.
>>>     >
>>>     >
>>>     > On 18.10.2017 17:03, [email protected] wrote:
>>>     >> As you know, Andy, I'm going to ISWC this year-- shall I
>>> buttonhole
>>>     >> them and give them our POV? :grin:
>>>     >>
>>>     >> In all seriousness, from what I can tell the results amount to
>>> "Using
>>>     >> older versions of our comparands and without contacting the
>>> projects
>>>     >> in question we couldn't find a store that implements every
>>> property
>>>     >> path feature correctly and some fail entirely."
>>>     >>
>>>     >> I'm not really sure how useful that information is...? But I am
>>> ready
>>>     >> to do a benchmarking paper for next year. Seems like it's a lot
>>> easier
>>>     >> than I thought!
>>>     >>
>>>     >>
>>>     >> ajs6f
>>>     >>
>>>     >>
>>>     >> Andy Seaborne wrote on 10/17/17 9:28 AM:
>>>     >>> Hi Lorenz,
>>>     >>>
>>>     >>> Looks like JENA-1195 which is fixed.  Does that look like it?
>>>     >>>
>>>     >>> I think it is shame when papers focus on bugs rather than
>>> discussing
>>>     >>> and even fixing them.  Bugs aren't research.
>>>     >>>
>>>     >>> Path evaluation could improved to stream in more cases (that's
>>> why
>>>     >>> LIMIT didn't help), but 1195 explains the slowness
>>>     >>> and memory.
>>>     >>>
>>>     >>>     Andy
>>>     >>>
>>>     >>> On 17/10/17 07:58, Lorenz B. wrote:
>>>     >>>> Hi,
>>>     >>>>
>>>     >>>> I just walked through the papers for the upcoming ISWC
>>> conference and
>>>     >>>> found a paper about benchmarking of SPARQL property paths [1] .
>>>     >>>>
>>>     >>>> Not sure if this is relevant, but it looks like Jena has some
>>> issues
>>>     >>>> with different types of queries using the property path. For
>>> example,
>>>     >>>>
>>>     >>>> SELECT ?o WHERE {A B* ?o.} LIMIT 100
>>>     >>>>
>>>     >>>> lead to an OOM error on non-cyclic data. Here is the relevant
>>> part of
>>>     >>>> the paper:
>>>     >>>>
>>>     >>>>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors
>>> or
>>>     >>>>> exceptions have occurred. During the benchmark process of Jena
>>> an
>>>     >>>>> OutOfMemoryError has been thrown whenever a query with the *
>>> operator
>>>     >>>>> was used. In order to identify the cause of the error, the
>>> amount of
>>>     >>>>> results the query should return has been limited to 100. The
>>> results
>>>     >>>>> that have been returned by a query of the form SELECT ?o WHERE
>>> {A B*
>>>     >>>>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100
>>> times A.
>>>     >>>>> Due to this fact it is presumable that the query containing the
>>> *
>>>     >>>>> operator returns A recursively until the main memory was full.
>>> To
>>>     >>>>> ensure that this behaviour is not caused by cycles in the
>>> dataset a
>>>     >>>>> query of the same form but with a predicate IRI that did not
>>> exist in
>>>     >>>>> the dataset was executed. This query still returned 100 times
>>> A. This
>>>     >>>>> indicates, that the * operator is not implemented correctly.
>>>     >>>> In addition, the experiments showed that:
>>>     >>>>> Due to the problems with the * operator the queries 4, 7 and 8
>>> could
>>>     >>>>> not be processed. Additionally query 3, 5, and 6 returned no
>>> results
>>>     >>>>> after 1 hour and thus, were aborted. Query 1 returned an empty
>>> and
>>>     >>>>> thus, incomplete result set. Only for query 2 a valid result
>>> was
>>>     >>>>> returned. Due to the lack of comparable results, Jena has been
>>> omitted
>>>     >>>>> in the comparison of triple stores.
>>>     >>>>
>>>     >>>> In the discussion section, they summarize the overall
>>> performance of
>>>     >>>> Jena by
>>>     >>>>
>>>     >>>>> Jena could not return results for any query in under 1 hour
>>> besides
>>>     >>>>> query 2. Furthermore, the * operator could not be evaluated at
>>> all and
>>>     >>>>> the inverse operator returned empty result sets.
>>>     >>>>
>>>     >>>> It looks like they used version 3.0.1, so maybe this doesn't
>>> hold
>>>     >>>> anymore for all of the queries. If not, it could be interesting
>>> to
>>>     >>>> improve performance and/or completeness.
>>>     >>>>
>>>     >>>> I hope I didn't miss some open JIRA ticket, but in general I
>>> just
>>>     >>>> wanted
>>>     >>>> to highlight the presence of some published benchmark for those
>>> kind of
>>>     >>>> queries.
>>>     >>>>
>>>     >>>>
>>>     >>>> Cheers,
>>>     >>>>
>>>     >>>> Lorenz
>>>     >>>>
>>>     >>>> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
>>>     >>>>
>>>     >
>>>
>>>
>>>
>>>     --
>>>
>>>
>>>     ---
>>>     Marco Neumann
>>>     KONA
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>



-- 


---
Marco Neumann
KONA

Re: Property Paths benchmark @ ISWC2017

Reply via email to