[
https://issues.apache.org/jira/browse/ASTERIXDB-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till updated ASTERIXDB-1296:
----------------------------
Component/s: Optimizer
> Common filtering predicate needs to be pushed down on both sides of a join
> --------------------------------------------------------------------------
>
> Key: ASTERIXDB-1296
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1296
> Project: Apache AsterixDB
> Issue Type: Improvement
> Components: Optimizer
> Reporter: Pouria
> Assignee: Yingyi Bu
>
> A common filtering predicate on the join attributes needs to be pushed down
> into both sides of the join.
> As an example, in the following PK-FK join query the dataset of the FK-side
> is fully scanned and hash-partitioned during the execution, while only those
> records for which (fk > 2) are relevant. The secondary index on this
> attribute can be exploited to filter them (or even they can be filtered out
> during scan, in case of no secondary index). From the performance perspective
> such a filtering could be helpful if the filtering predicate is highly
> selective. Currently one needs to add the filter on the FK-side explicitly
> beside the join predicate as:
> (where $b.fk = $a.id and $b.fk > 2 )
> drop dataverse test if exists;
> create dataverse test;
> use dataverse test;
> create type typeA as {
> id: int64
> }
> create type typeB as {
> id: int64 ,
> fk: int64
> }
> create dataset dsa(typeA) primary key id;
> create dataset dsb(typeB) primary key id;
> insert into dataset dsa ( {"id": 1} );
> insert into dataset dsa ( {"id": 2} );
> insert into dataset dsa ( {"id": 3} );
>
> insert into dataset dsb ( {"id": 1000, "fk": 1} );
> insert into dataset dsb ( {"id": 2000, "fk": 2} );
> insert into dataset dsb ( {"id": 3000, "fk": 3} );
> create index dsbFkIx on dsb(fk);
> for $a in dataset dsa
> where $a.id > 2
> return {
> "aid" : $a.id,
> "val" : for $b in dataset dsb
> where $b.fk = $a.id
> return $b.id
> }
> distribute result [%0->$$5]
> -- DISTRIBUTE_RESULT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$5])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$5] <- [function-call: asterix:closed-record-constructor,
> Args:[AString: {aid}, %0->$$14, AString: {val}, %0->$$12]]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> group by ([$$14 := %0->$$17]) decor ([]) {
> aggregate [$$12] <- [function-call: asterix:listify,
> Args:[%0->$$15]]
> -- AGGREGATE |LOCAL|
> select (function-call: algebricks:not,
> Args:[function-call: algebricks:is-null, Args:[%0->$$16]])
> -- STREAM_SELECT |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> -- PRE_CLUSTERED_GROUP_BY[$$17] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> order (ASC, %0->$$17)
> -- STABLE_SORT [$$17(ASC)] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$16, $$17, $$15])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> left outer join (function-call: algebricks:eq,
> Args:[%0->$$18, %0->$$17])
> -- HYBRID_HASH_JOIN [$$17][$$18] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$17])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> unnest-map [$$17, $$0] <- function-call:
> asterix:index-search, Args:[AString: {dsa}, AInt32: {0}, AString: {test},
> AString: {dsa}, ABoolean: {false}, ABoolean: {false}, ABoolean: {false},
> AInt32: {1}, %0->$$20, AInt32: {0}, FALSE, TRUE, FALSE]
> -- BTREE_SEARCH |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> assign [$$20] <- [AInt64: {2}]
> -- ASSIGN |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$$18] |PARTITIONED|
> project ([$$16, $$18, $$15])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$16, $$18] <- [TRUE, function-call:
> asterix:field-access-by-index, Args:[%0->$$1, AInt32: {1}]]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$15, $$1] <- test:dsb
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)