Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

Andy Seaborne Tue, 10 Mar 2015 07:10:47 -0700

On 10/03/15 13:43, Rose Beck wrote:

Dear Andy,


Thanks a lot for your reply :)
Well these questions do not have a background I just asked them out of
curiosity.

The order by clause in Jena by default orders in ascending order. Is
there some way by which I may describe in my SPARQL query to order the
results in descending order (It will be great if someone can explain
how to do the same using an example).


<cough>spec</cough>

DESC

        Andy


On Tue, Mar 10, 2015 at 5:46 PM, Andy Seaborne <[email protected]> wrote:

What's the background for all these questions ??!!

Index Joins have the useful feature that they take constant memory overhead.
This means one major area where RAM would otherwise run out is removed.

ARQ does not currently use hash joins.  It should, though SPARQL tends quite
strongly to produce chains of joins. (There are cases in SPARQL involving
OPTIONALs where index joins, with scope elimination, does not work as an
execution strategy.  See the literature.)

I have an alternative library of a collection of various join algorithms
that can be used to build alternative query execution strategies.  This
evaluator is not ready for production and not part of the project (yet?).

Pipeline joins are interesting as a sort of two-sided hash join which is
non-blocking.  Useful for multi-core and distributed evaluation.

         Andy




On 10/03/15 11:49, Rose Beck wrote:


Dear Rob,

Thanks a lot again :)

It seems to me that Index Joins are an efficient solution when the
query plan is a left deep join tree. However, when the query plan is
bushy and the RHS itself includes many Index Join operators
(essentially forming a sub-tree), then dont you think computing the
entire RHS sub-tree for each join result of the LHS sub-tree is
expensive. Please correct me if I am wrong. Or does Jena only builds
left deep join trees.

On Tue, Mar 10, 2015 at 5:11 PM, Rob Vesse <[email protected]> wrote:


Jena does use hash joins where necessary which are indeed blocking

However in most cases it can instead use index joins which are
essentially
a form of merge join whereby you substitute candidate solutions from the
LHS into the RHS and evaluate the RHS for each LHS solution

Rob

On 10/03/2015 11:10, "Rose Beck" <[email protected]> wrote:

Dear Rob,

Thanks a lot for the reply again.

But I am curious does Jena implement Hash joins -- which are
essentially blocking in nature. If Jena does not, then how is a join
between two unsorted lists (of intermediate results) done in Jena?



On Tue, Mar 10, 2015 at 4:34 PM, Rob Vesse <[email protected]> wrote:


Yes that is pretty much what happens

Note though that most query evaluation in ARQ is done in a streaming
fashion so the full set of solutions is typically never held in memory
for
any query unless an operator which requires full /partial
materialisation
e.g. DISTINCT is encountered

Rob

On 10/03/2015 10:24, "Rose Beck" <[email protected]> wrote:

Dear Rob,

First and foremost thank you for such a wonderful explanation.

Just to clarify, say my example query is:

select ?a?b?c where{?a <pred1> ?c. ?a <pred2> ?b} order by ?c limit 10

Then for the query above all the solutions are generated just as it
would be the case for the SPARQL query: select ?a?b?c where{?a <pred1>
?c. ?a <pred2> ?b}. But within the priority queue (which is the last
operation which is applied before results are output to the users) at
any point just 10 solutions ordered by ?c are placed. Please correct
me if I am wrong.

Cheers,
Rose


On Tue, Mar 10, 2015 at 3:38 PM, Rob Vesse <[email protected]>
wrote:


Query execution in ARQ is based on nested iterators so QueryIterTopN
will
always apply over another iterator

A PriorityQueue is used internally as temporary storage within
QueryIterTopN while it exhausts the inner iterator allowing it to
only
use
at most the limit amount of storage in the priority queue plus
whatever
temporary storage the inner iterator(s) may need.

There is still a "total sort" in the sense that every possible
solution
has to be compared to see if it should be placed into the priority
queue
however there is not a "total sort" in the sense of needing to
materialise
all possible solutions into memory and then sort.

Rob

p.s. Please don't post identical questions to both users@ and dev@ -
one
list is sufficient as the developers are on both lists.  As a general
rule
general support questions should go to users@ and
technical/architecture
questions like this should go to dev@



On 10/03/2015 05:54, "Rose Beck" <[email protected]> wrote:

Hi,

I saw the following issue posted on Jena website (which has been
recently resolved):
Avoid a total sort for ORDER BY + LIMIT queries
(https://issues.apache.org/jira/browse/JENA-89).

I am very interested in understanding as to how does Jena-ARQ avoids
total sort for ORDER BY + LIMIT queries. In the post it is mentioned
that Jena-ARQ uses priority queue for avoiding a final sort, however
it is also mentioned that "ARQ's algebra package contains already a
OpTopN [3] operator. The OpExecutor [4] will need to use a new
QueryIterTopN instead of QueryIterSort + QueryIterSlice." It is not
clear now does the priority queue benefit from OpTopN operator and
QueryIterTopN as the links [3] and [4] mentioned on the website does
not work, so I am not able to understand their operation and as to
how
do they help in avoiding a total sort.

Can someone please explain how does Jena-ARQ execute the queries
containing ORDER BY + LIMIT clause.

With Warm Regards,
Rose




--
With Warm Regards,
Rose




--
With Warm Regards,
Rose

Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

Reply via email to