Re: [DBpedia-discussion] Slow queries and inconsistent results when filtering on resource URI

Kingsley Idehen Thu, 03 Aug 2017 11:54:09 -0700

On 8/3/17 2:32 PM, Paul Houle wrote:
> he following query runs well on the public SPARQL endpoint:
>
>     SELECT ?type (COUNT(*) AS ?count) {
>         ?station a dbo:CareerStation .
>         ?who dbo:careerStation ?station .
>         ?who a ?type .
>     } GROUP BY ?type ORDER BY DESC(?count)
>
> It completes in a few seconds,  the only trouble with it is that there
> are 400,000 or so YAGO types in the system,  so the query hits the
> 10,000 row limit.
>
> If I want only dbo types,  I write the following query:
>
>     SELECT ?type (COUNT(*) AS ?count) {
>         ?station a dbo:CareerStation .
>         ?who dbo:careerStation ?station .
>         ?who a ?type .
>         FILTER(STRSTARTS(STR(?type),"http://dbpedia.org/ontology/";))
>     } GROUP BY ?type ORDER BY DESC(?count)
>
> which does not finish at all if I submit it with the rdflib sparql
> protocol client) (giving the default graph http://dbpedia.org),
>  failing with error message "Virtuoso S1T00 Error SR171: Transaction
> timed out"
>
> Now that query *does* work through the web interface when it has the
> default "30000" timeout.  However,  I find then that the number of
> results depends on the timeout,  for instance with a timeout of 5000 I get
>
> type  count
> http://dbpedia.org/ontology/Person    
> 15741
> http://dbpedia.org/ontology/Agent     
> 15741
> http://dbpedia.org/ontology/Athlete   
> 12378
> http://dbpedia.org/ontology/SoccerPlayer      
> 12378
> http://dbpedia.org/ontology/SoccerManager     
> 3363
> http://dbpedia.org/ontology/SportsManager     
> 3363
>
>
> and with 10000 I get
>
> type  count
> http://dbpedia.org/ontology/Person    
> 28962
> http://dbpedia.org/ontology/Agent     
> 28962
> http://dbpedia.org/ontology/Athlete   
> 23546
> http://dbpedia.org/ontology/SoccerPlayer      
> 23546
> http://dbpedia.org/ontology/SoccerManager     
> 5416
> http://dbpedia.org/ontology/SportsManager     
> 5416
>
>
> This wasn't what I expected,  what's up?
>


The larger the timeout the larger solution. This is what we call "Any
Time Query" solution i.e., providing partial results (which is indicated
via HTTP response header metadata) for queries that don't complete
within allotted timeout.

Remember, anybody could be running any combination of these kinds of
queries at any time. The DBMS challenge is all about offering fair use
of the DBpedia endpoint, to the planet.

-- 
Regards,

Kingsley Idehen       
Founder & CEO 
OpenLink Software   (Home Page: http://www.openlinksw.com)

Weblogs (Blogs):
Legacy Blog: http://www.openlinksw.com/blog/~kidehen/
Blogspot Blog: http://kidehen.blogspot.com
Medium Blog: https://medium.com/@kidehen

Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
        : 
http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

smime.p7s
Description: S/MIME Cryptographic Signature

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
DBpedia-discussion mailing list
DBpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [DBpedia-discussion] Slow queries and inconsistent results when filtering on resource URI

Reply via email to