Dear Rob, Thanks a lot again :)
It seems to me that Index Joins are an efficient solution when the query plan is a left deep join tree. However, when the query plan is bushy and the RHS itself includes many Index Join operators (essentially forming a sub-tree), then dont you think computing the entire RHS sub-tree for each join result of the LHS sub-tree is expensive. Please correct me if I am wrong. Or does Jena only builds left deep join trees. On Tue, Mar 10, 2015 at 5:11 PM, Rob Vesse <[email protected]> wrote: > Jena does use hash joins where necessary which are indeed blocking > > However in most cases it can instead use index joins which are essentially > a form of merge join whereby you substitute candidate solutions from the > LHS into the RHS and evaluate the RHS for each LHS solution > > Rob > > On 10/03/2015 11:10, "Rose Beck" <[email protected]> wrote: > >>Dear Rob, >> >>Thanks a lot for the reply again. >> >>But I am curious does Jena implement Hash joins -- which are >>essentially blocking in nature. If Jena does not, then how is a join >>between two unsorted lists (of intermediate results) done in Jena? >> >> >> >>On Tue, Mar 10, 2015 at 4:34 PM, Rob Vesse <[email protected]> wrote: >>> Yes that is pretty much what happens >>> >>> Note though that most query evaluation in ARQ is done in a streaming >>> fashion so the full set of solutions is typically never held in memory >>>for >>> any query unless an operator which requires full /partial >>>materialisation >>> e.g. DISTINCT is encountered >>> >>> Rob >>> >>> On 10/03/2015 10:24, "Rose Beck" <[email protected]> wrote: >>> >>>>Dear Rob, >>>> >>>>First and foremost thank you for such a wonderful explanation. >>>> >>>>Just to clarify, say my example query is: >>>> >>>>select ?a?b?c where{?a <pred1> ?c. ?a <pred2> ?b} order by ?c limit 10 >>>> >>>>Then for the query above all the solutions are generated just as it >>>>would be the case for the SPARQL query: select ?a?b?c where{?a <pred1> >>>>?c. ?a <pred2> ?b}. But within the priority queue (which is the last >>>>operation which is applied before results are output to the users) at >>>>any point just 10 solutions ordered by ?c are placed. Please correct >>>>me if I am wrong. >>>> >>>>Cheers, >>>>Rose >>>> >>>> >>>>On Tue, Mar 10, 2015 at 3:38 PM, Rob Vesse <[email protected]> wrote: >>>>> Query execution in ARQ is based on nested iterators so QueryIterTopN >>>>>will >>>>> always apply over another iterator >>>>> >>>>> A PriorityQueue is used internally as temporary storage within >>>>> QueryIterTopN while it exhausts the inner iterator allowing it to only >>>>>use >>>>> at most the limit amount of storage in the priority queue plus >>>>>whatever >>>>> temporary storage the inner iterator(s) may need. >>>>> >>>>> There is still a "total sort" in the sense that every possible >>>>>solution >>>>> has to be compared to see if it should be placed into the priority >>>>>queue >>>>> however there is not a "total sort" in the sense of needing to >>>>>materialise >>>>> all possible solutions into memory and then sort. >>>>> >>>>> Rob >>>>> >>>>> p.s. Please don't post identical questions to both users@ and dev@ - >>>>>one >>>>> list is sufficient as the developers are on both lists. As a general >>>>>rule >>>>> general support questions should go to users@ and >>>>>technical/architecture >>>>> questions like this should go to dev@ >>>>> >>>>> >>>>> >>>>> On 10/03/2015 05:54, "Rose Beck" <[email protected]> wrote: >>>>> >>>>>>Hi, >>>>>> >>>>>>I saw the following issue posted on Jena website (which has been >>>>>>recently resolved): >>>>>>Avoid a total sort for ORDER BY + LIMIT queries >>>>>>(https://issues.apache.org/jira/browse/JENA-89). >>>>>> >>>>>>I am very interested in understanding as to how does Jena-ARQ avoids >>>>>>total sort for ORDER BY + LIMIT queries. In the post it is mentioned >>>>>>that Jena-ARQ uses priority queue for avoiding a final sort, however >>>>>>it is also mentioned that "ARQ's algebra package contains already a >>>>>>OpTopN [3] operator. The OpExecutor [4] will need to use a new >>>>>>QueryIterTopN instead of QueryIterSort + QueryIterSlice." It is not >>>>>>clear now does the priority queue benefit from OpTopN operator and >>>>>>QueryIterTopN as the links [3] and [4] mentioned on the website does >>>>>>not work, so I am not able to understand their operation and as to how >>>>>>do they help in avoiding a total sort. >>>>>> >>>>>>Can someone please explain how does Jena-ARQ execute the queries >>>>>>containing ORDER BY + LIMIT clause. >>>>>> >>>>>>With Warm Regards, >>>>>>Rose >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>>-- >>>>With Warm Regards, >>>>Rose >>> >>> >>> >>> >> >> >> >>-- >>With Warm Regards, >>Rose > > > > -- With Warm Regards, Rose
