Jena does use hash joins where necessary which are indeed blocking

However in most cases it can instead use index joins which are essentially
a form of merge join whereby you substitute candidate solutions from the
LHS into the RHS and evaluate the RHS for each LHS solution

Rob

On 10/03/2015 11:10, "Rose Beck" <[email protected]> wrote:

>Dear Rob,
>
>Thanks a lot for the reply again.
>
>But I am curious does Jena implement Hash joins -- which are
>essentially blocking in nature. If Jena does not, then how is a join
>between two unsorted lists (of intermediate results) done in Jena?
>
>
>
>On Tue, Mar 10, 2015 at 4:34 PM, Rob Vesse <[email protected]> wrote:
>> Yes that is pretty much what happens
>>
>> Note though that most query evaluation in ARQ is done in a streaming
>> fashion so the full set of solutions is typically never held in memory
>>for
>> any query unless an operator which requires full /partial
>>materialisation
>> e.g. DISTINCT is encountered
>>
>> Rob
>>
>> On 10/03/2015 10:24, "Rose Beck" <[email protected]> wrote:
>>
>>>Dear Rob,
>>>
>>>First and foremost thank you for such a wonderful explanation.
>>>
>>>Just to clarify, say my example query is:
>>>
>>>select ?a?b?c where{?a <pred1> ?c. ?a <pred2> ?b} order by ?c limit 10
>>>
>>>Then for the query above all the solutions are generated just as it
>>>would be the case for the SPARQL query: select ?a?b?c where{?a <pred1>
>>>?c. ?a <pred2> ?b}. But within the priority queue (which is the last
>>>operation which is applied before results are output to the users) at
>>>any point just 10 solutions ordered by ?c are placed. Please correct
>>>me if I am wrong.
>>>
>>>Cheers,
>>>Rose
>>>
>>>
>>>On Tue, Mar 10, 2015 at 3:38 PM, Rob Vesse <[email protected]> wrote:
>>>> Query execution in ARQ is based on nested iterators so QueryIterTopN
>>>>will
>>>> always apply over another iterator
>>>>
>>>> A PriorityQueue is used internally as temporary storage within
>>>> QueryIterTopN while it exhausts the inner iterator allowing it to only
>>>>use
>>>> at most the limit amount of storage in the priority queue plus
>>>>whatever
>>>> temporary storage the inner iterator(s) may need.
>>>>
>>>> There is still a "total sort" in the sense that every possible
>>>>solution
>>>> has to be compared to see if it should be placed into the priority
>>>>queue
>>>> however there is not a "total sort" in the sense of needing to
>>>>materialise
>>>> all possible solutions into memory and then sort.
>>>>
>>>> Rob
>>>>
>>>> p.s. Please don't post identical questions to both users@ and dev@ -
>>>>one
>>>> list is sufficient as the developers are on both lists.  As a general
>>>>rule
>>>> general support questions should go to users@ and
>>>>technical/architecture
>>>> questions like this should go to dev@
>>>>
>>>>
>>>>
>>>> On 10/03/2015 05:54, "Rose Beck" <[email protected]> wrote:
>>>>
>>>>>Hi,
>>>>>
>>>>>I saw the following issue posted on Jena website (which has been
>>>>>recently resolved):
>>>>>Avoid a total sort for ORDER BY + LIMIT queries
>>>>>(https://issues.apache.org/jira/browse/JENA-89).
>>>>>
>>>>>I am very interested in understanding as to how does Jena-ARQ avoids
>>>>>total sort for ORDER BY + LIMIT queries. In the post it is mentioned
>>>>>that Jena-ARQ uses priority queue for avoiding a final sort, however
>>>>>it is also mentioned that "ARQ's algebra package contains already a
>>>>>OpTopN [3] operator. The OpExecutor [4] will need to use a new
>>>>>QueryIterTopN instead of QueryIterSort + QueryIterSlice." It is not
>>>>>clear now does the priority queue benefit from OpTopN operator and
>>>>>QueryIterTopN as the links [3] and [4] mentioned on the website does
>>>>>not work, so I am not able to understand their operation and as to how
>>>>>do they help in avoiding a total sort.
>>>>>
>>>>>Can someone please explain how does Jena-ARQ execute the queries
>>>>>containing ORDER BY + LIMIT clause.
>>>>>
>>>>>With Warm Regards,
>>>>>Rose
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>--
>>>With Warm Regards,
>>>Rose
>>
>>
>>
>>
>
>
>
>-- 
>With Warm Regards,
>Rose




Reply via email to