Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

Rob Vesse Tue, 10 Mar 2015 07:13:47 -0700

Andy

Paul's MINUS implementation does use a hash join


Rob

On 10/03/2015 12:16, "Andy Seaborne" <[email protected]> wrote:

>What's the background for all these questions ??!!
>
>Index Joins have the useful feature that they take constant memory
>overhead.  This means one major area where RAM would otherwise run out
>is removed.
>
>ARQ does not currently use hash joins.  It should, though SPARQL tends
>quite strongly to produce chains of joins. (There are cases in SPARQL
>involving OPTIONALs where index joins, with scope elimination, does not
>work as an execution strategy.  See the literature.)
>
>I have an alternative library of a collection of various join algorithms
>that can be used to build alternative query execution strategies.  This
>evaluator is not ready for production and not part of the project (yet?).
>
>Pipeline joins are interesting as a sort of two-sided hash join which is
>non-blocking.  Useful for multi-core and distributed evaluation.
>
>       Andy
>
>
>
>On 10/03/15 11:49, Rose Beck wrote:
>> Dear Rob,
>>
>> Thanks a lot again :)
>>
>> It seems to me that Index Joins are an efficient solution when the
>> query plan is a left deep join tree. However, when the query plan is
>> bushy and the RHS itself includes many Index Join operators
>> (essentially forming a sub-tree), then dont you think computing the
>> entire RHS sub-tree for each join result of the LHS sub-tree is
>> expensive. Please correct me if I am wrong. Or does Jena only builds
>> left deep join trees.
>>
>> On Tue, Mar 10, 2015 at 5:11 PM, Rob Vesse <[email protected]> wrote:
>>> Jena does use hash joins where necessary which are indeed blocking
>>>
>>> However in most cases it can instead use index joins which are
>>>essentially
>>> a form of merge join whereby you substitute candidate solutions from
>>>the
>>> LHS into the RHS and evaluate the RHS for each LHS solution
>>>
>>> Rob
>>>
>>> On 10/03/2015 11:10, "Rose Beck" <[email protected]> wrote:
>>>
>>>> Dear Rob,
>>>>
>>>> Thanks a lot for the reply again.
>>>>
>>>> But I am curious does Jena implement Hash joins -- which are
>>>> essentially blocking in nature. If Jena does not, then how is a join
>>>> between two unsorted lists (of intermediate results) done in Jena?
>>>>
>>>>
>>>>
>>>> On Tue, Mar 10, 2015 at 4:34 PM, Rob Vesse <[email protected]>
>>>>wrote:
>>>>> Yes that is pretty much what happens
>>>>>
>>>>> Note though that most query evaluation in ARQ is done in a streaming
>>>>> fashion so the full set of solutions is typically never held in
>>>>>memory
>>>>> for
>>>>> any query unless an operator which requires full /partial
>>>>> materialisation
>>>>> e.g. DISTINCT is encountered
>>>>>
>>>>> Rob
>>>>>
>>>>> On 10/03/2015 10:24, "Rose Beck" <[email protected]> wrote:
>>>>>
>>>>>> Dear Rob,
>>>>>>
>>>>>> First and foremost thank you for such a wonderful explanation.
>>>>>>
>>>>>> Just to clarify, say my example query is:
>>>>>>
>>>>>> select ?a?b?c where{?a <pred1> ?c. ?a <pred2> ?b} order by ?c limit
>>>>>>10
>>>>>>
>>>>>> Then for the query above all the solutions are generated just as it
>>>>>> would be the case for the SPARQL query: select ?a?b?c where{?a
>>>>>><pred1>
>>>>>> ?c. ?a <pred2> ?b}. But within the priority queue (which is the last
>>>>>> operation which is applied before results are output to the users)
>>>>>>at
>>>>>> any point just 10 solutions ordered by ?c are placed. Please correct
>>>>>> me if I am wrong.
>>>>>>
>>>>>> Cheers,
>>>>>> Rose
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 10, 2015 at 3:38 PM, Rob Vesse <[email protected]>
>>>>>>wrote:
>>>>>>> Query execution in ARQ is based on nested iterators so
>>>>>>>QueryIterTopN
>>>>>>> will
>>>>>>> always apply over another iterator
>>>>>>>
>>>>>>> A PriorityQueue is used internally as temporary storage within
>>>>>>> QueryIterTopN while it exhausts the inner iterator allowing it to
>>>>>>>only
>>>>>>> use
>>>>>>> at most the limit amount of storage in the priority queue plus
>>>>>>> whatever
>>>>>>> temporary storage the inner iterator(s) may need.
>>>>>>>
>>>>>>> There is still a "total sort" in the sense that every possible
>>>>>>> solution
>>>>>>> has to be compared to see if it should be placed into the priority
>>>>>>> queue
>>>>>>> however there is not a "total sort" in the sense of needing to
>>>>>>> materialise
>>>>>>> all possible solutions into memory and then sort.
>>>>>>>
>>>>>>> Rob
>>>>>>>
>>>>>>> p.s. Please don't post identical questions to both users@ and dev@
>>>>>>>-
>>>>>>> one
>>>>>>> list is sufficient as the developers are on both lists.  As a
>>>>>>>general
>>>>>>> rule
>>>>>>> general support questions should go to users@ and
>>>>>>> technical/architecture
>>>>>>> questions like this should go to dev@
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10/03/2015 05:54, "Rose Beck" <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I saw the following issue posted on Jena website (which has been
>>>>>>>> recently resolved):
>>>>>>>> Avoid a total sort for ORDER BY + LIMIT queries
>>>>>>>> (https://issues.apache.org/jira/browse/JENA-89).
>>>>>>>>
>>>>>>>> I am very interested in understanding as to how does Jena-ARQ
>>>>>>>>avoids
>>>>>>>> total sort for ORDER BY + LIMIT queries. In the post it is
>>>>>>>>mentioned
>>>>>>>> that Jena-ARQ uses priority queue for avoiding a final sort,
>>>>>>>>however
>>>>>>>> it is also mentioned that "ARQ's algebra package contains already
>>>>>>>>a
>>>>>>>> OpTopN [3] operator. The OpExecutor [4] will need to use a new
>>>>>>>> QueryIterTopN instead of QueryIterSort + QueryIterSlice." It is
>>>>>>>>not
>>>>>>>> clear now does the priority queue benefit from OpTopN operator and
>>>>>>>> QueryIterTopN as the links [3] and [4] mentioned on the website
>>>>>>>>does
>>>>>>>> not work, so I am not able to understand their operation and as
>>>>>>>>to how
>>>>>>>> do they help in avoiding a total sort.
>>>>>>>>
>>>>>>>> Can someone please explain how does Jena-ARQ execute the queries
>>>>>>>> containing ORDER BY + LIMIT clause.
>>>>>>>>
>>>>>>>> With Warm Regards,
>>>>>>>> Rose
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> With Warm Regards,
>>>>>> Rose
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> With Warm Regards,
>>>> Rose
>>>
>>>
>>>
>>>
>>
>>
>>
>

Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

Reply via email to