We have been troubled by lots of overhead for SPARQL parsing in our
code base, and I have been looking for something like the
"preparedstatement" of sql with the ability to pre-interpret the
statement and set parameters upon execution. Digging around on QA
sites and in the code base I have come up with the following code:
private static final Query q1template = .. create query with ?param
expression...
...
QuerySolutionMap params = new QuerySolutionMap();
params.add("param", myresource);
QueryExecution exec = QueryExecutionFactory.create(q1template, model,params);
This doubled the speed of our code, since the SPARQL parsing is pretty
heavy and can be made a "constant" part like an sql preparedstatement.
Now this does not really work, since there is mutable state in
q1template and it can't be shared (it fails in setResultVars
intermittently - and possibly other places too). Looking at the code
base I see that there seem to be several potential gotchas that are
probably lurking right under the hood, there are too many per-request
bits and pieces of state that are kept in the Query.
Not giving up that easily, I created the following method:
public static Query copyOf(Query query){
Query copy = new Query(query);
copy.setQueryPattern(query.getQueryPattern());
if ( query.isSelectType()) copy.setQuerySelectType();
else if (query.isAskType()) copy.setQueryAskType();
else if (query.isConstructType()) copy.setQueryConstructType();
else if (query.isDescribeType()) copy.setQueryDescribeType();
if (query.hasOrderBy()){
for (SortCondition sortCondition : query.getOrderBy()) {
copy.addOrderBy(sortCondition);
}
}
return copy;
}
And now I can seemingly use
QueryExecution exec = QueryExecutionFactory.create(copyOf(q1template),
model,params);
and actually get this working concurrently with re-use of the parsed
sparql expression. But of course there's ElementSubQuery pointing back
at the Query object, so a deep copy of the queryPattern Element
structure would probably also be required, which all seems a bit like
barking up the wrong tree. And I'm not really sure I understand all
the cross-references well enough to know that this would even
work....(?)
It would seem to me that the "clean" thing to do would be to separate
the parsed sparql from the per-request state. By simple name it would
sound like "QueryExecution" would be a nice place for these things.
Would this work, is it a good idea ?
Alternately (and much simpler), one could simply make a "reset" method
on the Query object and accept sequential re-use. In that way I could
use ThreadLocals to keep one instance of the query per thread and
reduce the overall sparql parsing.
I'm sort of digging a bit for suggestions/ideas here, since I'm not
all that familiar with the code base :) We're talking about a fairly
heavy CO2 footprint in terms of energy expenditure for repeated
parsing of all that sparql :)
(Initially I adressed this to the "users" list but I sort of changed
my mind as the details got gorier and gorier....)
Kristian