Hello,
as pointed out earlier, I'm having some issues with the new SPARQL
endpoint. I'm currently using DBpedia to generate dictionaries for a
task in an information extraction class I'm taking.
For this task, I need a list of entities, e.g. actors. Consider the
following query:
SELECT ?name WHERE { ?a
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://dbpedia.org/ontology/Actor> . ?a
<http://www.w3.org/2000/01/rdf-schema#label> ?name }
With DBpedia 3.2, this would work just fine. With the current release,
the query will time out after a while, giving me a partial result list.
This actually is a feature called "Anytime" queries. [0]
I wonder if enabling the Anytime feature is a good idea - not because I
can't get my list of actors, but because it's broken, undocumented and
proprietary:
Kingsley Idehen wrote:
>There is a bit of a doc mess up right now. New docs are in progress etc..
> Re. SPARQL protocol, this should come through HTTP response, but weare
>still working on this part.
That's not exactly in the "SPARQL Protocol for RDF" recommendation.
There is no way right now to let a SPARQL-compliant client know there
are more results. AFAIK, it is also impossible to set these timeouts
using the SPARQL Protocol. I don't think proprietary protocol extensions
are the right thing for an Open project.
Additionally, handing out different result sets for the same query
depending on what kind of data is cached and how far subordinate clauses
from *previous* queries have been evaluated (see [0]) sounds broken. In
fact, I don't believe the SPARQL W3C recommendation allows that (section
12.5, "Evaluation Semantics").
I do acknowledge that handling web-scale data sets presents a problem,
but I'd rather see a query language which can do proper chunking of
results instead of breaking SPARQL.
Anyways - I tried to work around this issue by using the LIMIT and
OFFSET solution sequence modifiers. The W3C recommendation states:
"Using LIMIT and OFFSET to select different subsets of the query
solutions will not be useful unless the order is made predictable by
using ORDER BY." - so throw in an ORDER BY as well. This will break
after some iterations:
22023 Error SR353: Sorted TOP clause specifies more then 10100 rows to
sort. Only 10000 are allowed. Either decrease the offset and/or row
count or use a scrollable cursor
SPARQL query:
define sql:signal-void-variables 1 define input:default-graph-uri
<http://dbpedia.org> SELECT ?name WHERE { ?a
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://dbpedia.org/ontology/Actor> . ?a
<http://www.w3.org/2000/01/rdf-schema#label> ?name } ORDER BY ?name
LIMIT 100 OFFSET 10000
Any ideas?
Regards,
Michael
[0] http://www.openlinksw.com/weblog/oerling/?id=1494
------------------------------------------------------------------------------
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion