Andy Paul's MINUS implementation does use a hash join
Rob On 10/03/2015 12:16, "Andy Seaborne" <[email protected]> wrote: >What's the background for all these questions ??!! > >Index Joins have the useful feature that they take constant memory >overhead. This means one major area where RAM would otherwise run out >is removed. > >ARQ does not currently use hash joins. It should, though SPARQL tends >quite strongly to produce chains of joins. (There are cases in SPARQL >involving OPTIONALs where index joins, with scope elimination, does not >work as an execution strategy. See the literature.) > >I have an alternative library of a collection of various join algorithms >that can be used to build alternative query execution strategies. This >evaluator is not ready for production and not part of the project (yet?). > >Pipeline joins are interesting as a sort of two-sided hash join which is >non-blocking. Useful for multi-core and distributed evaluation. > > Andy > > > >On 10/03/15 11:49, Rose Beck wrote: >> Dear Rob, >> >> Thanks a lot again :) >> >> It seems to me that Index Joins are an efficient solution when the >> query plan is a left deep join tree. However, when the query plan is >> bushy and the RHS itself includes many Index Join operators >> (essentially forming a sub-tree), then dont you think computing the >> entire RHS sub-tree for each join result of the LHS sub-tree is >> expensive. Please correct me if I am wrong. Or does Jena only builds >> left deep join trees. >> >> On Tue, Mar 10, 2015 at 5:11 PM, Rob Vesse <[email protected]> wrote: >>> Jena does use hash joins where necessary which are indeed blocking >>> >>> However in most cases it can instead use index joins which are >>>essentially >>> a form of merge join whereby you substitute candidate solutions from >>>the >>> LHS into the RHS and evaluate the RHS for each LHS solution >>> >>> Rob >>> >>> On 10/03/2015 11:10, "Rose Beck" <[email protected]> wrote: >>> >>>> Dear Rob, >>>> >>>> Thanks a lot for the reply again. >>>> >>>> But I am curious does Jena implement Hash joins -- which are >>>> essentially blocking in nature. If Jena does not, then how is a join >>>> between two unsorted lists (of intermediate results) done in Jena? >>>> >>>> >>>> >>>> On Tue, Mar 10, 2015 at 4:34 PM, Rob Vesse <[email protected]> >>>>wrote: >>>>> Yes that is pretty much what happens >>>>> >>>>> Note though that most query evaluation in ARQ is done in a streaming >>>>> fashion so the full set of solutions is typically never held in >>>>>memory >>>>> for >>>>> any query unless an operator which requires full /partial >>>>> materialisation >>>>> e.g. DISTINCT is encountered >>>>> >>>>> Rob >>>>> >>>>> On 10/03/2015 10:24, "Rose Beck" <[email protected]> wrote: >>>>> >>>>>> Dear Rob, >>>>>> >>>>>> First and foremost thank you for such a wonderful explanation. >>>>>> >>>>>> Just to clarify, say my example query is: >>>>>> >>>>>> select ?a?b?c where{?a <pred1> ?c. ?a <pred2> ?b} order by ?c limit >>>>>>10 >>>>>> >>>>>> Then for the query above all the solutions are generated just as it >>>>>> would be the case for the SPARQL query: select ?a?b?c where{?a >>>>>><pred1> >>>>>> ?c. ?a <pred2> ?b}. But within the priority queue (which is the last >>>>>> operation which is applied before results are output to the users) >>>>>>at >>>>>> any point just 10 solutions ordered by ?c are placed. Please correct >>>>>> me if I am wrong. >>>>>> >>>>>> Cheers, >>>>>> Rose >>>>>> >>>>>> >>>>>> On Tue, Mar 10, 2015 at 3:38 PM, Rob Vesse <[email protected]> >>>>>>wrote: >>>>>>> Query execution in ARQ is based on nested iterators so >>>>>>>QueryIterTopN >>>>>>> will >>>>>>> always apply over another iterator >>>>>>> >>>>>>> A PriorityQueue is used internally as temporary storage within >>>>>>> QueryIterTopN while it exhausts the inner iterator allowing it to >>>>>>>only >>>>>>> use >>>>>>> at most the limit amount of storage in the priority queue plus >>>>>>> whatever >>>>>>> temporary storage the inner iterator(s) may need. >>>>>>> >>>>>>> There is still a "total sort" in the sense that every possible >>>>>>> solution >>>>>>> has to be compared to see if it should be placed into the priority >>>>>>> queue >>>>>>> however there is not a "total sort" in the sense of needing to >>>>>>> materialise >>>>>>> all possible solutions into memory and then sort. >>>>>>> >>>>>>> Rob >>>>>>> >>>>>>> p.s. Please don't post identical questions to both users@ and dev@ >>>>>>>- >>>>>>> one >>>>>>> list is sufficient as the developers are on both lists. As a >>>>>>>general >>>>>>> rule >>>>>>> general support questions should go to users@ and >>>>>>> technical/architecture >>>>>>> questions like this should go to dev@ >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 10/03/2015 05:54, "Rose Beck" <[email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I saw the following issue posted on Jena website (which has been >>>>>>>> recently resolved): >>>>>>>> Avoid a total sort for ORDER BY + LIMIT queries >>>>>>>> (https://issues.apache.org/jira/browse/JENA-89). >>>>>>>> >>>>>>>> I am very interested in understanding as to how does Jena-ARQ >>>>>>>>avoids >>>>>>>> total sort for ORDER BY + LIMIT queries. In the post it is >>>>>>>>mentioned >>>>>>>> that Jena-ARQ uses priority queue for avoiding a final sort, >>>>>>>>however >>>>>>>> it is also mentioned that "ARQ's algebra package contains already >>>>>>>>a >>>>>>>> OpTopN [3] operator. The OpExecutor [4] will need to use a new >>>>>>>> QueryIterTopN instead of QueryIterSort + QueryIterSlice." It is >>>>>>>>not >>>>>>>> clear now does the priority queue benefit from OpTopN operator and >>>>>>>> QueryIterTopN as the links [3] and [4] mentioned on the website >>>>>>>>does >>>>>>>> not work, so I am not able to understand their operation and as >>>>>>>>to how >>>>>>>> do they help in avoiding a total sort. >>>>>>>> >>>>>>>> Can someone please explain how does Jena-ARQ execute the queries >>>>>>>> containing ORDER BY + LIMIT clause. >>>>>>>> >>>>>>>> With Warm Regards, >>>>>>>> Rose >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> With Warm Regards, >>>>>> Rose >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> With Warm Regards, >>>> Rose >>> >>> >>> >>> >> >> >> >
