Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

Rose Beck Tue, 10 Mar 2015 04:50:49 -0700

Dear Rob,

Thanks a lot again :)


It seems to me that Index Joins are an efficient solution when the
query plan is a left deep join tree. However, when the query plan is
bushy and the RHS itself includes many Index Join operators
(essentially forming a sub-tree), then dont you think computing the
entire RHS sub-tree for each join result of the LHS sub-tree is
expensive. Please correct me if I am wrong. Or does Jena only builds
left deep join trees.

On Tue, Mar 10, 2015 at 5:11 PM, Rob Vesse <[email protected]> wrote:
> Jena does use hash joins where necessary which are indeed blocking
>
> However in most cases it can instead use index joins which are essentially
> a form of merge join whereby you substitute candidate solutions from the
> LHS into the RHS and evaluate the RHS for each LHS solution
>
> Rob
>
> On 10/03/2015 11:10, "Rose Beck" <[email protected]> wrote:
>
>>Dear Rob,
>>
>>Thanks a lot for the reply again.
>>
>>But I am curious does Jena implement Hash joins -- which are
>>essentially blocking in nature. If Jena does not, then how is a join
>>between two unsorted lists (of intermediate results) done in Jena?
>>
>>
>>
>>On Tue, Mar 10, 2015 at 4:34 PM, Rob Vesse <[email protected]> wrote:
>>> Yes that is pretty much what happens
>>>
>>> Note though that most query evaluation in ARQ is done in a streaming
>>> fashion so the full set of solutions is typically never held in memory
>>>for
>>> any query unless an operator which requires full /partial
>>>materialisation
>>> e.g. DISTINCT is encountered
>>>
>>> Rob
>>>
>>> On 10/03/2015 10:24, "Rose Beck" <[email protected]> wrote:
>>>
>>>>Dear Rob,
>>>>
>>>>First and foremost thank you for such a wonderful explanation.
>>>>
>>>>Just to clarify, say my example query is:
>>>>
>>>>select ?a?b?c where{?a <pred1> ?c. ?a <pred2> ?b} order by ?c limit 10
>>>>
>>>>Then for the query above all the solutions are generated just as it
>>>>would be the case for the SPARQL query: select ?a?b?c where{?a <pred1>
>>>>?c. ?a <pred2> ?b}. But within the priority queue (which is the last
>>>>operation which is applied before results are output to the users) at
>>>>any point just 10 solutions ordered by ?c are placed. Please correct
>>>>me if I am wrong.
>>>>
>>>>Cheers,
>>>>Rose
>>>>
>>>>
>>>>On Tue, Mar 10, 2015 at 3:38 PM, Rob Vesse <[email protected]> wrote:
>>>>> Query execution in ARQ is based on nested iterators so QueryIterTopN
>>>>>will
>>>>> always apply over another iterator
>>>>>
>>>>> A PriorityQueue is used internally as temporary storage within
>>>>> QueryIterTopN while it exhausts the inner iterator allowing it to only
>>>>>use
>>>>> at most the limit amount of storage in the priority queue plus
>>>>>whatever
>>>>> temporary storage the inner iterator(s) may need.
>>>>>
>>>>> There is still a "total sort" in the sense that every possible
>>>>>solution
>>>>> has to be compared to see if it should be placed into the priority
>>>>>queue
>>>>> however there is not a "total sort" in the sense of needing to
>>>>>materialise
>>>>> all possible solutions into memory and then sort.
>>>>>
>>>>> Rob
>>>>>
>>>>> p.s. Please don't post identical questions to both users@ and dev@ -
>>>>>one
>>>>> list is sufficient as the developers are on both lists.  As a general
>>>>>rule
>>>>> general support questions should go to users@ and
>>>>>technical/architecture
>>>>> questions like this should go to dev@
>>>>>
>>>>>
>>>>>
>>>>> On 10/03/2015 05:54, "Rose Beck" <[email protected]> wrote:
>>>>>
>>>>>>Hi,
>>>>>>
>>>>>>I saw the following issue posted on Jena website (which has been
>>>>>>recently resolved):
>>>>>>Avoid a total sort for ORDER BY + LIMIT queries
>>>>>>(https://issues.apache.org/jira/browse/JENA-89).
>>>>>>
>>>>>>I am very interested in understanding as to how does Jena-ARQ avoids
>>>>>>total sort for ORDER BY + LIMIT queries. In the post it is mentioned
>>>>>>that Jena-ARQ uses priority queue for avoiding a final sort, however
>>>>>>it is also mentioned that "ARQ's algebra package contains already a
>>>>>>OpTopN [3] operator. The OpExecutor [4] will need to use a new
>>>>>>QueryIterTopN instead of QueryIterSort + QueryIterSlice." It is not
>>>>>>clear now does the priority queue benefit from OpTopN operator and
>>>>>>QueryIterTopN as the links [3] and [4] mentioned on the website does
>>>>>>not work, so I am not able to understand their operation and as to how
>>>>>>do they help in avoiding a total sort.
>>>>>>
>>>>>>Can someone please explain how does Jena-ARQ execute the queries
>>>>>>containing ORDER BY + LIMIT clause.
>>>>>>
>>>>>>With Warm Regards,
>>>>>>Rose
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>With Warm Regards,
>>>>Rose
>>>
>>>
>>>
>>>
>>
>>
>>
>>--
>>With Warm Regards,
>>Rose
>
>
>
>



-- 
With Warm Regards,
Rose

Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

Reply via email to