Hi Matteo
It depends on exactly what you are trying to time and what store you are
talking to.
For example if the store is TDB then execSelect() is simply causing the query
plan to be generated, optimized and turned into a QueryIterator that can return
the actual results of the query. So timing around execSelect() would not
really be appropriate because TDB has not really done any work to answer the
query at that stage.
However if your store is a remote store accessing via SPARQL over HTTP (you
used QueryExecutionFactory.sparqlService()) to get a QueryExecution then timing
around execSelect() might be more useful because then the timing reflects the
time for the HTTP request to be made and for the store to start returning
results. This won't necessarily mean all results are returned as depending on
the response format from the server ARQ may stream the results.
Generally my recommendation and what we do internally is to time the time from
when we make the execSelect() call to when we finish iterating over the
results, I would recommend not doing anything with the iteration other than
incrementing a count as otherwise you may skew your figures as what you do with
each result may be far more computationally costly than just iterating over
them.
We have a benchmarking tool that we use internally and we distinguish these two
things as response time and runtime, the former being the time for the first
result to be received and the latter being the time for all results to be
received. Often the two figures can be massively differently especially with
queries that generate very large results.
Hope this helps
Rob
On Feb 17, 2012, at 8:39 AM, Matteo Casu wrote:
> Dear list,
>
> I'm doing some experiments comparing SPARQL querying on different
> platforms, with different entailment regimes (mainly RDFS and OWL2QL). I
> came into the overhead problem of ResultSet (or also ResultSetFormatter). I
> Reading other threads, I learned that Arq works as a buffer, and that the
> execSelect method do not really compute the query. The real results are
> retrieved looping over the resultSet. I naively thought that the execSelect
> would have done the query, and that ResultSet was only for printing or
> seeing results.
> Now, the question is: I would like to know whether the right thing for
> measuring the time of querying is:
> - to take the time before and after the execution of execSelect() , or
> - at the end of the loop...
>
> Any hint would be highly appreciated. Thanks in advance!
>
> Mat
> _______________________
>
> Here the snippet:
>
> Query query=QueryFactory.create(queryString);
> QueryExecution qexec = QueryExecutionFactory.create(query,
> model);ResultSet results = qexec.execSelect();
>
> while (results.hasNext()) { here print on file; }