Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

Rob Vesse Tue, 10 Mar 2015 07:13:51 -0700

Understand that we are not executing the entire RHS for every LHS solution
we are executing the RHS constrained to the constants provided by the LHS


So we are evaluating many more specific variants of the RHS and joining
directly rather than evaluating the generic form of the RHS and having to
join afterwards

Rob

On 10/03/2015 11:49, "Rose Beck" <[email protected]> wrote:

>Dear Rob,
>
>Thanks a lot again :)
>
>It seems to me that Index Joins are an efficient solution when the
>query plan is a left deep join tree. However, when the query plan is
>bushy and the RHS itself includes many Index Join operators
>(essentially forming a sub-tree), then dont you think computing the
>entire RHS sub-tree for each join result of the LHS sub-tree is
>expensive. Please correct me if I am wrong. Or does Jena only builds
>left deep join trees.
>
>On Tue, Mar 10, 2015 at 5:11 PM, Rob Vesse <[email protected]> wrote:
>> Jena does use hash joins where necessary which are indeed blocking
>>
>> However in most cases it can instead use index joins which are
>>essentially
>> a form of merge join whereby you substitute candidate solutions from the
>> LHS into the RHS and evaluate the RHS for each LHS solution
>>
>> Rob
>>
>> On 10/03/2015 11:10, "Rose Beck" <[email protected]> wrote:
>>
>>>Dear Rob,
>>>
>>>Thanks a lot for the reply again.
>>>
>>>But I am curious does Jena implement Hash joins -- which are
>>>essentially blocking in nature. If Jena does not, then how is a join
>>>between two unsorted lists (of intermediate results) done in Jena?
>>>
>>>
>>>
>>>On Tue, Mar 10, 2015 at 4:34 PM, Rob Vesse <[email protected]> wrote:
>>>> Yes that is pretty much what happens
>>>>
>>>> Note though that most query evaluation in ARQ is done in a streaming
>>>> fashion so the full set of solutions is typically never held in memory
>>>>for
>>>> any query unless an operator which requires full /partial
>>>>materialisation
>>>> e.g. DISTINCT is encountered
>>>>
>>>> Rob
>>>>
>>>> On 10/03/2015 10:24, "Rose Beck" <[email protected]> wrote:
>>>>
>>>>>Dear Rob,
>>>>>
>>>>>First and foremost thank you for such a wonderful explanation.
>>>>>
>>>>>Just to clarify, say my example query is:
>>>>>
>>>>>select ?a?b?c where{?a <pred1> ?c. ?a <pred2> ?b} order by ?c limit 10
>>>>>
>>>>>Then for the query above all the solutions are generated just as it
>>>>>would be the case for the SPARQL query: select ?a?b?c where{?a <pred1>
>>>>>?c. ?a <pred2> ?b}. But within the priority queue (which is the last
>>>>>operation which is applied before results are output to the users) at
>>>>>any point just 10 solutions ordered by ?c are placed. Please correct
>>>>>me if I am wrong.
>>>>>
>>>>>Cheers,
>>>>>Rose
>>>>>
>>>>>
>>>>>On Tue, Mar 10, 2015 at 3:38 PM, Rob Vesse <[email protected]>
>>>>>wrote:
>>>>>> Query execution in ARQ is based on nested iterators so QueryIterTopN
>>>>>>will
>>>>>> always apply over another iterator
>>>>>>
>>>>>> A PriorityQueue is used internally as temporary storage within
>>>>>> QueryIterTopN while it exhausts the inner iterator allowing it to
>>>>>>only
>>>>>>use
>>>>>> at most the limit amount of storage in the priority queue plus
>>>>>>whatever
>>>>>> temporary storage the inner iterator(s) may need.
>>>>>>
>>>>>> There is still a "total sort" in the sense that every possible
>>>>>>solution
>>>>>> has to be compared to see if it should be placed into the priority
>>>>>>queue
>>>>>> however there is not a "total sort" in the sense of needing to
>>>>>>materialise
>>>>>> all possible solutions into memory and then sort.
>>>>>>
>>>>>> Rob
>>>>>>
>>>>>> p.s. Please don't post identical questions to both users@ and dev@ -
>>>>>>one
>>>>>> list is sufficient as the developers are on both lists.  As a
>>>>>>general
>>>>>>rule
>>>>>> general support questions should go to users@ and
>>>>>>technical/architecture
>>>>>> questions like this should go to dev@
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/03/2015 05:54, "Rose Beck" <[email protected]> wrote:
>>>>>>
>>>>>>>Hi,
>>>>>>>
>>>>>>>I saw the following issue posted on Jena website (which has been
>>>>>>>recently resolved):
>>>>>>>Avoid a total sort for ORDER BY + LIMIT queries
>>>>>>>(https://issues.apache.org/jira/browse/JENA-89).
>>>>>>>
>>>>>>>I am very interested in understanding as to how does Jena-ARQ avoids
>>>>>>>total sort for ORDER BY + LIMIT queries. In the post it is mentioned
>>>>>>>that Jena-ARQ uses priority queue for avoiding a final sort, however
>>>>>>>it is also mentioned that "ARQ's algebra package contains already a
>>>>>>>OpTopN [3] operator. The OpExecutor [4] will need to use a new
>>>>>>>QueryIterTopN instead of QueryIterSort + QueryIterSlice." It is not
>>>>>>>clear now does the priority queue benefit from OpTopN operator and
>>>>>>>QueryIterTopN as the links [3] and [4] mentioned on the website does
>>>>>>>not work, so I am not able to understand their operation and as to
>>>>>>>how
>>>>>>>do they help in avoiding a total sort.
>>>>>>>
>>>>>>>Can someone please explain how does Jena-ARQ execute the queries
>>>>>>>containing ORDER BY + LIMIT clause.
>>>>>>>
>>>>>>>With Warm Regards,
>>>>>>>Rose
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>--
>>>>>With Warm Regards,
>>>>>Rose
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>--
>>>With Warm Regards,
>>>Rose
>>
>>
>>
>>
>
>
>
>-- 
>With Warm Regards,
>Rose

Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

Reply via email to