Hi All

 

I'm forwarding an issue that was raised by one of the users of my API with
regards to unexpectedly long query time.  The email thread on this can be
found below but basically summarises as follows - he has a relatively simple
query that executes virtually instantly via the DBPedia SPARQL web form at
dbpedia.org/sparql or when evaluated through Jena but went very slowly when
evaluated via my API.  

 

Now I thought this was slightly odd and probably not actually my APIs fault
so I went ahead and did some digging and found that if evaluated via a HTTP
GET through my API it is virtually instantaneous but when evaluated using a
POST it takes a long time:

 

Sync Query (GET): 00:00:00.0983944

Result: True

1 Results

?ans = Molawin River@en

 

Sync Query (POST): 00:00:05.5907184

Result: True

1 Results

?ans = Molawin River@en

 

So this is a factor of 56 difference in query time, this seems very strange
to me and I'm at a lost to explain it!

 

To eliminate the issue of POST having marginally more overhead than a GET
I've tried the same query against other endpoints including FactForge, the
LOD Cloud (lod.openlinksw.com/sparql) and a local endpoint and it seems to
be just DBPedia that has this issue.  This is despite both the LOD Cloud and
one of my local endpoints being Virtuoso based so it doesn't obviously
appear to be a Virtuoso related issue which was my first theory.

 

Does anyone know why there is this massive difference in query time for POST
vs GET and is it possible to fix this?

 

Regards,

 

Rob Vesse

 

 

From: Rob Vesse [mailto:[email protected]] 
Sent: 21 September 2011 13:00
To: Steve S
Cc: [email protected]
Subject: Re: [dotNetRDF-bugs] Query time

 

Hi Steve

 

Sorry for taking a while to get back to you but I've been rather busy of
late.

 

I put together some unit tests to reproduce this and what I found was rather
interesting.  It appears to be due to the fact that the async query API in
the Silverlight/WP7 builds always use POST for simplicity and that DBPedia
appears to handle POST requests poorly.

 

Here's the results run against DBPedia:

 

Sync Query (GET): 00:00:00.0983944

Result: True

1 Results

?ans = Molawin River@en

 

Sync Query (POST): 00:00:05.5907184

Result: True

1 Results

?ans = Molawin River@en

 

Async Query: 00:00:05.6701002

Result: True

1 Results

?ans = Molawin River@en

 

Here's the results run against Factforge:

 

Sync Query (GET): 00:00:00.5197172

Result: True

1 Results

?ans = Molawin River@en

 

Sync Query (POST): 00:00:00.2956622

Result: True

1 Results

?ans = Molawin River@en

 

Async Query: 00:00:00.2214282

Result: True

1 Results

?ans = Molawin River@en

 

Here's the results run against a local endpoint (note this endpoint doesn't
have the data to answer the query I was just trying to get an idea of
whether it was an issue of POST vs GET or an issue with the server):

 

Sync Query (GET): 00:00:00.0420308

Result: True

0 Results

 

Sync Query (POST): 00:00:00.0020514

Result: True

0 Results

 

Async Query: 00:00:00.0042980

Result: True

0 Results

 

So as you can see for all three endpoints that the Sync query (GET) is
virtually instant which is what you said you'd observed when using the web
interface or via other SPARQL clients.  The odd thing is that for the other
endpoints there is virtually no difference in time between the GET and the
POST variants whereas for DBPedia there is a massive difference as you
noted.

 

I will pass this issue on to the DBPedia folks and see what response if any
I get, as a fix I can probably do a rewrite of the async query APIs so that
it uses GET for most queries which should alleviate this issue for you.
I'll get back to you on this again in a couple of days time.

 

Regards,

 

Rob Vesse

 

  _____  

From: "Steve S" <[email protected]>
Sent: 16 September 2011 11:03
To: "Rob Vesse" <[email protected]>
Subject: Query time

Hi Rob,

I've found a small issue with dotNetRDF and WP7.

I'm running the following query against the DBpedia endpoint:

PREFIX p: <http://dbpedia.org/property/>
PREFIX o: <http://dbpedia.org/ontology/>
PREFIX xs: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?ans WHERE
{ 
    ?s rdf:type o:River .
    ?s p:length ?l .                            
    FILTER (xs:integer(?l)) .
    OPTIONAL
    {
        ?s rdfs:label ?ans . 
        FILTER (lang(?ans ) = 'en')
    }
}
ORDER BY DESC(?l) 
LIMIT 1

However, it is taking 8 times more to return the results than it takes when
I run the same query directly at dbpedia.org/sparql or through a Java API
(androjena).

The problem seems to be in the following method in the SPARQLRemoteEndpoint
class

public void QueryWithResultSet(String query, SparqlResultsCallback callback,
Object state)
{
    request.BeginGetRequestStream(result =>
    {
        .
        .
        .
        
        System.Diagnostics.Debug.WriteLine("1: " + DateTime.Now); //test
        
        request.BeginGetResponse(innerResult =>
        {
            using (HttpWebResponse response =
(HttpWebResponse)request.EndGetResponse(innerResult))
            {
                System.Diagnostics.Debug.WriteLine("2: " + DateTime.Now);
//test
                
                .
                .
                .
            }
        }
    }
}

Any reason why it might be taking more time?

Thanks
Steve

 

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
dotNetRDF-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to