Hi all,
slightly off-topic, but given the ongoing ESWC 2023 conference, I want
to share two papers that might be interesting for the one or the other:
1. Join Ordering of SPARQL Property Path Queries
SPARQL property path queries provide a succinct way to write complex
navigational queries over RDF knowledge graphs. However, their
evaluation remains difficult as they may involve the execution of
transitive closures. As a result, many property path queries just
timeout when executed on public online RDF knowledge graphs. One
solution to speed up their execution is to find optimal join orders.
Although the join ordering problem has been extensively studied for
traditional SPARQL queries, the presence of property path patterns
biases existing approaches. In this paper we focus on C2RP QUF queries
(conjunctive SPARQL property path queries with UNION and FILTER), and
we present a query optimizer that is able to capture the cost of C2RP
QUF queries using an appropriate cost model and a sampling-based
cardinality estimator. On the latest Wikidata Query Benchmark, we
empirically demonstrate that our approach finds significantly better
join orders than Virtuoso and BlazeGraph.
Paper:
https://2023.eswc-conferences.org/wp-content/uploads/2023/05/paper_Aimonier-Davat_2023_Join.pdf
Not directly related to Jena, but interesting anyways.
2. Evaluation of a Representative Selection of SPARQL Query Engines
using Wikidata
In this paper, we present an evaluation of the performance of five
representative RDF triplestores, including GraphDB, Jena Fuseki,
Neptune, RDFox, and Stardog, and one experimental SPARQL query engine,
QLever. We compare importing time, loading time, and exporting time
using a complete version of the knowledge graph Wikidata, and we also
evaluate query performances using 328 queries defined by Wikidata
users. To put this evaluation into context with respect to previous
evaluations, we also analyze the query performances of these systems
using a prominent synthetic benchmark: SP2Bench. We observed that most
of the systems we considered for the evaluation were able to complete
the execution of almost all the queries defined by Wikidata users
before the timeout we established. We noticed, however, that the time
needed by most systems to import and export Wikidata might be longer
than required in some industrial and academic projects, where
information is represented, enriched, and stored using different
representation means.
Paper:
https://2023.eswc-conferences.org/wp-content/uploads/2023/05/paper_Lam_2023_Evaluation.pdf
In the second paper Jena TDB2 (v4.4.0) has been used during the benchmark.
Cheers,
Lorenz