Hi,
when someone submits a SPARQL query which contains an ORDER BY + LIMIT,
ARQ sorts the entire and then takes the first N, according to the specified
LIMIT.

If the ORDER BY operates over a large number of bindings, these sort of
queries can become a problem. We see ORDER BY queries with small LIMITs
and often enough no OFFSET specified.

Fortunately, in the ARQ algebra package there is already a OpTopN [1]
operator, which currently get executed as a sort followed by a slice,
see OpExecutor [2]:

  protected QueryIterator execute(OpTopN opTop, QueryIterator input)
  {
      // XXX Do better; execute - OpTopN
      QueryIterator qIter = executeOp(opTop.getSubOp(), input) ;
      qIter = new QueryIterSort(qIter, opTop.getConditions(), execCxt) ;
      qIter = new QueryIterSlice(qIter, 0, opTop.getLimit(), execCxt) ;
      return qIter ;
  }

The comment suggests something better is possible. :-)

We could avoid the sort, perhaps using a heap [3] and, in Java, a
PriorityQueue [4,5]. We would need to add a QueryIterTopN and use it
instead of the QueryIterSort + QueryIterSlice.

 "It is a common misconception that a priority queue is a heap.
  A priority queue is an abstract concept like "a list" or "a map";
  just as a list can be implemented with a linked list or an array,
  a priority queue can be implemented with a heap or a variety of
  other methods." [4]

But

  A Java PriorityQueue is "an unbounded priority queue based on a
  priority heap." [5]

The other thing to do is to spot an ORDER BY + LIMIT (with a sufficiently
small LIMIT?) and use the OpTopN instead of the OpOrder and OpSlice.
This would need to be done in a new TransformOrderByLimit or TransformTopN
transformation to be used by Optimize.java (in which position?).

Does this seem a reasonable thing to do and a reasonable way to do it?

Cheers,
Paolo

 [1] 
https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/algebra/op/OpTopN.java
 [2] 
https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/engine/main/OpExecutor.java
 [3] http://en.wikipedia.org/wiki/Heap_%28data_structure%29
 [4] http://en.wikipedia.org/wiki/Priority_queue
 [5] http://download.oracle.com/javase/6/docs/api/java/util/PriorityQueue.html

Reply via email to