On 9/21/11 8:14 AM, Rob Vesse wrote:

Hi All

I'm forwarding an issue that was raised by one of the users of my API with regards to unexpectedly long query time. The email thread on this can be found below but basically summarises as follows - he has a relatively simple query that executes virtually instantly via the DBPedia SPARQL web form at dbpedia.org/sparql or when evaluated through Jena but went very slowly when evaluated via my API.

Now I thought this was slightly odd and probably not actually my APIs fault so I went ahead and did some digging and found that if evaluated via a HTTP GET through my API it is virtually instantaneous but when evaluated using a POST it takes a long time:

Sync Query (GET): 00:00:00.0983944

Result: True

1 Results

?ans = Molawin River@en

Sync Query (POST): 00:00:05.5907184

Result: True

1 Results

?ans = Molawin River@en

So this is a factor of 56 difference in query time, this seems very strange to me and I'm at a lost to explain it!

To eliminate the issue of POST having marginally more overhead than a GET I've tried the same query against other endpoints including FactForge, the LOD Cloud (lod.openlinksw.com/sparql) and a local endpoint and it seems to be just DBPedia that has this issue. This is despite both the LOD Cloud and one of my local endpoints being Virtuoso based so it doesn't obviously appear to be a Virtuoso related issue which was my first theory.


lod.openlinksw.com is an 8-node cluster edition of Virtuoso where the each cluster node is hosted by a cluster blade (a server machine). DBpedia is a smaller setup whereby you have 4 virtuoso cluster nodes all hosted on a single machine.

Does anyone know why there is this massive difference in query time for POST vs GET and is it possible to fix this?


Anyway, we'll look to see what's going on here re. DBpedia setup e.g., reverse proxy use etc..

Kingsley

Regards,

Rob Vesse

*From:*Rob Vesse [mailto:[email protected]]
*Sent:* 21 September 2011 13:00
*To:* Steve S
*Cc:* [email protected]
*Subject:* Re: [dotNetRDF-bugs] Query time

Hi Steve

Sorry for taking a while to get back to you but I've been rather busy of late.

I put together some unit tests to reproduce this and what I found was rather interesting. It appears to be due to the fact that the async query API in the Silverlight/WP7 builds always use POST for simplicity and that DBPedia appears to handle POST requests poorly.

Here's the results run against DBPedia:

Sync Query (GET): 00:00:00.0983944

Result: True

1 Results

?ans = Molawin River@en

Sync Query (POST): 00:00:05.5907184

Result: True

1 Results

?ans = Molawin River@en

Async Query: 00:00:05.6701002

Result: True

1 Results

?ans = Molawin River@en

Here's the results run against Factforge:

Sync Query (GET): 00:00:00.5197172

Result: True

1 Results

?ans = Molawin River@en

Sync Query (POST): 00:00:00.2956622

Result: True

1 Results

?ans = Molawin River@en

Async Query: 00:00:00.2214282

Result: True

1 Results

?ans = Molawin River@en

Here's the results run against a local endpoint (note this endpoint doesn't have the data to answer the query I was just trying to get an idea of whether it was an issue of POST vs GET or an issue with the server):

Sync Query (GET): 00:00:00.0420308

Result: True

0 Results

Sync Query (POST): 00:00:00.0020514

Result: True

0 Results

Async Query: 00:00:00.0042980

Result: True

0 Results

So as you can see for all three endpoints that the Sync query (GET) is virtually instant which is what you said you'd observed when using the web interface or via other SPARQL clients. The odd thing is that for the other endpoints there is virtually no difference in time between the GET and the POST variants whereas for DBPedia there is a massive difference as you noted.

I will pass this issue on to the DBPedia folks and see what response if any I get, as a fix I can probably do a rewrite of the async query APIs so that it uses GET for most queries which should alleviate this issue for you. I'll get back to you on this again in a couple of days time.

Regards,

Rob Vesse

------------------------------------------------------------------------

*From*: "Steve S" <[email protected]>
*Sent*: 16 September 2011 11:03
*To*: "Rob Vesse" <[email protected]>
*Subject*: Query time

Hi Rob,

I've found a small issue with dotNetRDF and WP7.

I'm running the following query against the DBpedia endpoint:

PREFIX p: <http://dbpedia.org/property/>
PREFIX o: <http://dbpedia.org/ontology/>
PREFIX xs: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?ans WHERE
{
    ?s rdf:type o:River .
    ?s p:length ?l .
    FILTER (xs:integer(?l)) .
    OPTIONAL
    {
        ?s rdfs:label ?ans .
        FILTER (lang(?ans ) = 'en')
    }
}
ORDER BY DESC(?l)
LIMIT 1

However, it is taking 8 times more to return the results than it takes when I run the same query directly at dbpedia.org/sparql or through a Java API (androjena).

The problem seems to be in the following method in the SPARQLRemoteEndpoint class

public void QueryWithResultSet(String query, SparqlResultsCallback callback, Object state)
{
    request.BeginGetRequestStream(result =>
    {
        .
        .
        .

        System.Diagnostics.Debug.WriteLine("1: " + DateTime.Now); //test

*request.BeginGetResponse(innerResult =>**
*        {*
* using (HttpWebResponse response = (HttpWebResponse)request.EndGetResponse(innerResult))*
*            {**
System.Diagnostics.Debug.WriteLine("2: " + DateTime.Now); //test

                .
                .
                .
            }
        }
    }
}

Any reason why it might be taking more time?

Thanks
Steve


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1


_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen





Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to