Thanks Yingyi. Yes. If there is an equality condition and if we can't transform a join into an index-nested loop join, then a hybrid hash join will be used.
Best, Taewoo On Thu, Jan 7, 2016 at 3:14 PM, Yingyi Bu <[email protected]> wrote: > +1! > >>3. We only try to use applicable indexes from the inner branch. So, if > >>there are no applicable indexes from the inner branch, we abort > >>transforming a join into an index-nested-loop join. > "index-nested-loop join" -> "hybrid hash join"? > > Thanks! > > Yingyi > > > > On Thu, Jan 7, 2016 at 3:01 PM, Taewoo Kim <[email protected]> wrote: > > > Hello dev, > > > > Regarding this issue (ASTERIXDB-1249), I would like to make an index-join > > hint clarification. Let's start with an example query other than a > > self-join query in this issue. > > > > for $c in dataset('Customers') > > > > for $o in dataset('Orders') > > > > where $c.cid /*+ indexnl */ = $o.cid > > > > order by $c.cid, $o.oid > > > > return {"cid":$c.cid, "oid": $o.oid} > > > > Right now, in the master branch, the first dataset (e.g., Customers) > > becomes the outer branch and the second dataset (e.g., Orders) becomes > the > > inner branch. And, when the optimizer tries to honor the given indexnl > hint > > (transforming a join into an index-nested-loop join), if there are > > applicable indexes from the inner branch (e.g., Orders), then it is going > > to use one of those indexes. If there are no applicable indexes from the > > inner branch, it tries to use indexes from the outer branch (e.g., > > Customers). We are going to change the last part; we will not use indexes > > from the outer branch. So, the following are refined rules for > transforming > > a join into an index-nested-loop join. > > > > 1. The first dataset in a join (the first parameter of the given join) > > becomes the outer branch. > > 2. The second dataset in a join (the second parameter of the given join) > > becomes the inner branch. > > 3. We only try to use applicable indexes from the inner branch. So, if > > there are no applicable indexes from the inner branch, we abort > > transforming a join into an index-nested-loop join. > > 4. Variable order in the given join predicate is not important. It can be > > either outer.fieldA = inner.fieldB or inner.fieldB = outer.fieldA. > > > > So, for the left-outer join and inner join altogether, the left subtree > is > > the probing side and the right subtree is the index side. So, this can be > > applied to the self-join case, too just like the following query in this > > issue. In the following query, $t1 becomes the outer and $t2 becomes the > > inner. > > > > for $t1 in dataset('TweetMessages') > > > > for $t2 in dataset('TweetMessages') > > > > let $c := $t1.countA + 20 > > > > where $c /* +indexnl */= $t2.countB > > > > order by $t2.tweetid > > > > return {"tweetid2": $t2.tweetid, "count2":$t2.countB}; > > > > Thank you. Any opinion would be appreciated before I finalize this fix. > > > > Best, > > Taewoo > > > > ---------- Forwarded message ---------- > > From: Yingyi Bu (JIRA) <[email protected]> > > Date: Wed, Jan 6, 2016 at 10:23 AM > > Subject: [jira] [Commented] (ASTERIXDB-1249) Self index join chooses > wrong > > probe/index branch > > To: [email protected] > > > > > > > > [ > > > > > https://issues.apache.org/jira/browse/ASTERIXDB-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085992#comment-15085992 > > ] > > > > Yingyi Bu commented on ASTERIXDB-1249: > > -------------------------------------- > > > > That's right. Basically in AcessMethod implementations, e.g.: > > > > > https://github.com/apache/incubator-asterixdb/blob/master/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/am/RTreeAccessMethod.java#L124 > > > > Choosing probe/index branch is only based on dataset names, instead of > > being based on the join condition. > > > > > Self index join chooses wrong probe/index branch > > > ------------------------------------------------ > > > > > > Key: ASTERIXDB-1249 > > > URL: > > https://issues.apache.org/jira/browse/ASTERIXDB-1249 > > > Project: Apache AsterixDB > > > Issue Type: Bug > > > Components: Optimizer > > > Reporter: Yingyi Bu > > > Assignee: Taewoo Kim > > > > > > DDLs: > > > {noformat} > > > drop dataverse test if exists; > > > create dataverse test; > > > use dataverse test; > > > create type TwitterUserType as closed { > > > screen-name: string, > > > lang: string, > > > friends-count: int64, > > > statuses-count: int64, > > > name: string, > > > followers-count: int64 > > > } > > > create type TweetMessageType as closed { > > > tweetid: int64, > > > user: TwitterUserType, > > > sender-location: point, > > > send-time: datetime, > > > referred-topics: {{ string }}, > > > message-text: string, > > > countA: int64, > > > countB: int64 > > > } > > > create dataset TweetMessages(TweetMessageType) > > > primary key tweetid; > > > create index twmSndLocIx on TweetMessages(sender-location) type rtree; > > > create index msgCountAIx on TweetMessages(countA) type btree; > > > create index msgCountBIx on TweetMessages(countB) type btree; > > > create index msgTextIx on TweetMessages(message-text) type keyword; > > > {noformat} > > > Query: > > > {noformat} > > > for $t1 in dataset('TweetMessages') > > > for $t2 in dataset('TweetMessages') > > > let $n := create-circle($t1.sender-location, 0.5) > > > where spatial-intersect($t2.sender-location, $n) > > > order by $t2.tweetid > > > return {"tweetid2":$t2.tweetid, "loc2":$t2.sender-location}; > > > {noformat} > > > Optimized plan: > > > {noformat} > > > distribute result [%0->$$10] > > > -- DISTRIBUTE_RESULT |PARTITIONED| > > > exchange > > > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > > > project ([$$10]) > > > -- STREAM_PROJECT |PARTITIONED| > > > assign [$$10] <- [function-call: > asterix:closed-record-constructor, > > Args:[AString: {tweetid2}, %0->$$15, AString: {loc2}, %0->$$13]] > > > -- ASSIGN |PARTITIONED| > > > exchange > > > -- SORT_MERGE_EXCHANGE [$$15(ASC) ] |PARTITIONED| > > > order (ASC, %0->$$15) > > > -- STABLE_SORT [$$15(ASC)] |PARTITIONED| > > > exchange > > > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > > > project ([$$13, $$15]) > > > -- STREAM_PROJECT |PARTITIONED| > > > select (function-call: asterix:spatial-intersect, > > Args:[%0->$$13, function-call: asterix:create-circle, > Args:[function-call: > > asterix:field-access-by-index, Args:[%0->$$0, AInt32: {2}], ADouble: > > {0.5}]]) > > > -- STREAM_SELECT |PARTITIONED| > > > project ([$$0, $$13, $$15]) > > > -- STREAM_PROJECT |PARTITIONED| > > > exchange > > > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > > > unnest-map [$$14, $$0] <- function-call: > > asterix:index-search, Args:[AString: {TweetMessages}, AInt32: {0}, > AString: > > {test}, AString: {TweetMessages}, ABoolean: {true}, ABoolean: {false}, > > ABoolean: {false}, AInt32: {1}, %0->$$27, AInt32: {1}, %0->$$27, TRUE, > > TRUE, TRUE] > > > -- BTREE_SEARCH |PARTITIONED| > > > exchange > > > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > > > order (ASC, %0->$$27) > > > -- STABLE_SORT [$$27(ASC)] |PARTITIONED| > > > exchange > > > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > > > project ([$$27, $$13, $$15]) > > > -- STREAM_PROJECT |PARTITIONED| > > > exchange > > > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > > > unnest-map [$$23, $$24, $$25, $$26, > > $$27] <- function-call: asterix:index-search, Args:[AString: > {twmSndLocIx}, > > AInt32: {1}, AString: {test}, AString: {TweetMessages}, ABoolean: {true}, > > ABoolean: {false}, ABoolean: {true}, AInt32: {4}, %0->$$19, %0->$$20, > > %0->$$21, %0->$$22] > > > -- RTREE_SEARCH |PARTITIONED| > > > exchange > > > -- BROADCAST_EXCHANGE > |PARTITIONED| > > > assign [$$19, $$20, $$21, $$22] > <- > > [function-call: asterix:create-mbr, Args:[%0->$$13, AInt32: {2}, AInt32: > > {0}], function-call: asterix:create-mbr, Args:[%0->$$13, AInt32: {2}, > > AInt32: {1}], function-call: asterix:create-mbr, Args:[%0->$$13, AInt32: > > {2}, AInt32: {2}], function-call: asterix:create-mbr, Args:[%0->$$13, > > AInt32: {2}, AInt32: {3}]] > > > -- ASSIGN |PARTITIONED| > > > project ([$$13, $$15]) > > > -- STREAM_PROJECT > |PARTITIONED| > > > assign [$$13] <- > > [function-call: asterix:field-access-by-index, Args:[%0->$$1, AInt32: > {2}]] > > > -- ASSIGN |PARTITIONED| > > > exchange > > > -- ONE_TO_ONE_EXCHANGE > > |PARTITIONED| > > > data-scan []<-[$$15, $$1] > > <- test:TweetMessages > > > -- DATASOURCE_SCAN > > |PARTITIONED| > > > exchange > > > -- ONE_TO_ONE_EXCHANGE > > |PARTITIONED| > > > empty-tuple-source > > > -- EMPTY_TUPLE_SOURCE > > |PARTITIONED| > > > {noformat} > > > The optimized plan is incorrect --- the index search doesn't use the > > right join condition and hence the result is different from expected. > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v6.3.4#6332) > > >
