Re: Jena ResultSet

Robert Vesse Fri, 17 Feb 2012 09:28:40 -0800

Hi Matteo

It depends on exactly what you are trying to time and what store you are 
talking to.

For example if the store is TDB then execSelect() is simply causing the query 
plan to be generated, optimized and turned into a QueryIterator that can return 
the actual results of the query.  So timing around execSelect() would not 
really be appropriate because TDB has not really done any work to answer the 
query at that stage.

However if your store is a remote store accessing via SPARQL over HTTP (you 
used QueryExecutionFactory.sparqlService()) to get a QueryExecution then timing 
around execSelect() might be more useful because then the timing reflects the 
time for the HTTP request to be made and for the store to start returning 
results.  This won't necessarily mean all results are returned as depending on 
the response format from the server ARQ may stream the results.

Generally my recommendation and what we do internally is to time the time from 
when we make the execSelect() call to when we finish iterating over the 
results, I would recommend not doing anything with the iteration other than 
incrementing a count as otherwise you may skew your figures as what you do with 
each result may be far more computationally costly than just iterating over 
them.

We have a benchmarking tool that we use internally and we distinguish these two 
things as response time and runtime, the former being the time for the first 
result to be received and the latter being the time for all results to be 
received.  Often the two figures can be massively differently especially with 
queries that generate very large results.

Hope this helps

Rob

On Feb 17, 2012, at 8:39 AM, Matteo Casu wrote:

> Dear list,
> 
> I'm doing some experiments comparing SPARQL querying on different
> platforms, with different entailment regimes (mainly RDFS and OWL2QL). I
> came into the overhead problem of ResultSet (or also ResultSetFormatter). I
> Reading other threads, I learned that Arq works as a buffer, and that the
> execSelect method do not really compute the query. The real results are
> retrieved looping over the resultSet. I naively thought that the execSelect
> would have done the query, and that ResultSet was only for printing or
> seeing results.
> Now, the question is: I would like to know whether the right thing for
> measuring the time of querying is:
> - to take the time before and after the execution of execSelect() , or
> -  at the end of the loop...
> 
> Any hint would be highly appreciated. Thanks in advance!
> 
> Mat
> _______________________
> 
> Here the snippet:
> 
> Query query=QueryFactory.create(queryString);
> QueryExecution qexec = QueryExecutionFactory.create(query,
> model);ResultSet results = qexec.execSelect();
> 
> while (results.hasNext()) { here print on file; }

Re: Jena ResultSet

Reply via email to