Avoid a total sort for ORDER BY + LIMIT queries
-----------------------------------------------
Key: JENA-89
URL: https://issues.apache.org/jira/browse/JENA-89
Project: Jena
Issue Type: Improvement
Components: ARQ
Reporter: Paolo Castagna
Assignee: Paolo Castagna
Priority: Minor
In case of SPARQL queries with ORDER BY + LIMIT, ARQ sorts the entire result
set and then produces the first N, according to the specified LIMIT.
As an alternative, we can use a
[PriorityQueue|http://download.oracle.com/javase/6/docs/api/java/util/PriorityQueue.html]
(which is based on a priority heap) to avoid a sort operation.
ARQ's algebra package contains already a
[OpTopN|https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/algebra/op/OpTopN.java]
operator. The
[OpExecutor|https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/engine/main/OpExecutor.java]
will need to use a new QueryIterTopN instead of QueryIterSort +
QueryIterSlice. A new TransformOrderByLimit to be used by Optimize is also
necessary.
ORDER BY + LIMIT queries are typically used to construct the first page when
results are paginated. Then the following query is ORDER BY + OFFSET + LIMIT.
(Often users stop at the first page). Ideally, we could cache the ORDER BY and
implement the OFFSET|LIMIT using results from the cache. However, the
improvement described by this issue is limited to the ORDER BY + LIMIT case for
which a priority heap is a good enough solution.
Hopefully, this would improve the scalability of ORDER BY + LIMIT queries in
case of small values specified on the LIMIT.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira