Jena does use hash joins where necessary which are indeed blocking However in most cases it can instead use index joins which are essentially a form of merge join whereby you substitute candidate solutions from the LHS into the RHS and evaluate the RHS for each LHS solution
Rob On 10/03/2015 11:10, "Rose Beck" <[email protected]> wrote: >Dear Rob, > >Thanks a lot for the reply again. > >But I am curious does Jena implement Hash joins -- which are >essentially blocking in nature. If Jena does not, then how is a join >between two unsorted lists (of intermediate results) done in Jena? > > > >On Tue, Mar 10, 2015 at 4:34 PM, Rob Vesse <[email protected]> wrote: >> Yes that is pretty much what happens >> >> Note though that most query evaluation in ARQ is done in a streaming >> fashion so the full set of solutions is typically never held in memory >>for >> any query unless an operator which requires full /partial >>materialisation >> e.g. DISTINCT is encountered >> >> Rob >> >> On 10/03/2015 10:24, "Rose Beck" <[email protected]> wrote: >> >>>Dear Rob, >>> >>>First and foremost thank you for such a wonderful explanation. >>> >>>Just to clarify, say my example query is: >>> >>>select ?a?b?c where{?a <pred1> ?c. ?a <pred2> ?b} order by ?c limit 10 >>> >>>Then for the query above all the solutions are generated just as it >>>would be the case for the SPARQL query: select ?a?b?c where{?a <pred1> >>>?c. ?a <pred2> ?b}. But within the priority queue (which is the last >>>operation which is applied before results are output to the users) at >>>any point just 10 solutions ordered by ?c are placed. Please correct >>>me if I am wrong. >>> >>>Cheers, >>>Rose >>> >>> >>>On Tue, Mar 10, 2015 at 3:38 PM, Rob Vesse <[email protected]> wrote: >>>> Query execution in ARQ is based on nested iterators so QueryIterTopN >>>>will >>>> always apply over another iterator >>>> >>>> A PriorityQueue is used internally as temporary storage within >>>> QueryIterTopN while it exhausts the inner iterator allowing it to only >>>>use >>>> at most the limit amount of storage in the priority queue plus >>>>whatever >>>> temporary storage the inner iterator(s) may need. >>>> >>>> There is still a "total sort" in the sense that every possible >>>>solution >>>> has to be compared to see if it should be placed into the priority >>>>queue >>>> however there is not a "total sort" in the sense of needing to >>>>materialise >>>> all possible solutions into memory and then sort. >>>> >>>> Rob >>>> >>>> p.s. Please don't post identical questions to both users@ and dev@ - >>>>one >>>> list is sufficient as the developers are on both lists. As a general >>>>rule >>>> general support questions should go to users@ and >>>>technical/architecture >>>> questions like this should go to dev@ >>>> >>>> >>>> >>>> On 10/03/2015 05:54, "Rose Beck" <[email protected]> wrote: >>>> >>>>>Hi, >>>>> >>>>>I saw the following issue posted on Jena website (which has been >>>>>recently resolved): >>>>>Avoid a total sort for ORDER BY + LIMIT queries >>>>>(https://issues.apache.org/jira/browse/JENA-89). >>>>> >>>>>I am very interested in understanding as to how does Jena-ARQ avoids >>>>>total sort for ORDER BY + LIMIT queries. In the post it is mentioned >>>>>that Jena-ARQ uses priority queue for avoiding a final sort, however >>>>>it is also mentioned that "ARQ's algebra package contains already a >>>>>OpTopN [3] operator. The OpExecutor [4] will need to use a new >>>>>QueryIterTopN instead of QueryIterSort + QueryIterSlice." It is not >>>>>clear now does the priority queue benefit from OpTopN operator and >>>>>QueryIterTopN as the links [3] and [4] mentioned on the website does >>>>>not work, so I am not able to understand their operation and as to how >>>>>do they help in avoiding a total sort. >>>>> >>>>>Can someone please explain how does Jena-ARQ execute the queries >>>>>containing ORDER BY + LIMIT clause. >>>>> >>>>>With Warm Regards, >>>>>Rose >>>> >>>> >>>> >>>> >>> >>> >>> >>>-- >>>With Warm Regards, >>>Rose >> >> >> >> > > > >-- >With Warm Regards, >Rose
