Hi,
when someone submits a SPARQL query which contains an ORDER BY + LIMIT,
ARQ sorts the entire and then takes the first N, according to the specified
LIMIT.
If the ORDER BY operates over a large number of bindings, these sort of
queries can become a problem. We see ORDER BY queries with small LIMITs
and often enough no OFFSET specified.
Fortunately, in the ARQ algebra package there is already a OpTopN [1]
operator, which currently get executed as a sort followed by a slice,
see OpExecutor [2]:
protected QueryIterator execute(OpTopN opTop, QueryIterator input)
{
// XXX Do better; execute - OpTopN
QueryIterator qIter = executeOp(opTop.getSubOp(), input) ;
qIter = new QueryIterSort(qIter, opTop.getConditions(), execCxt) ;
qIter = new QueryIterSlice(qIter, 0, opTop.getLimit(), execCxt) ;
return qIter ;
}
The comment suggests something better is possible. :-)
We could avoid the sort, perhaps using a heap [3] and, in Java, a
PriorityQueue [4,5]. We would need to add a QueryIterTopN and use it
instead of the QueryIterSort + QueryIterSlice.
"It is a common misconception that a priority queue is a heap.
A priority queue is an abstract concept like "a list" or "a map";
just as a list can be implemented with a linked list or an array,
a priority queue can be implemented with a heap or a variety of
other methods." [4]
But
A Java PriorityQueue is "an unbounded priority queue based on a
priority heap." [5]
The other thing to do is to spot an ORDER BY + LIMIT (with a sufficiently
small LIMIT?) and use the OpTopN instead of the OpOrder and OpSlice.
This would need to be done in a new TransformOrderByLimit or TransformTopN
transformation to be used by Optimize.java (in which position?).
Does this seem a reasonable thing to do and a reasonable way to do it?
Cheers,
Paolo
[1]
https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/algebra/op/OpTopN.java
[2]
https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/engine/main/OpExecutor.java
[3] http://en.wikipedia.org/wiki/Heap_%28data_structure%29
[4] http://en.wikipedia.org/wiki/Priority_queue
[5] http://download.oracle.com/javase/6/docs/api/java/util/PriorityQueue.html