[
https://issues.apache.org/jira/browse/PHOENIX-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Taeyun Kim updated PHOENIX-1997:
--------------------------------
Description:
Joined tables are as follows:
{noformat}
create table table_a
(
time_id integer not null,
depth tinyint not null,
id0 integer not null,
id1 integer not null,
id2 integer not null,
id3 integer not null,
id integer not null,
record_type smallint not null,
c varbinary
constraint pk primary key(time_id, depth, id0, id1, id2, id3, id,
record_type)
)
salt_buckets=4,
compression='SNAPPY',
create table table_b
(
depth tinyint not null,
id0 integer not null,
id1 integer not null,
id integer not null,
c varbinary,
p varbinary
constraint pk primary key(depth, id0, id1, id)
)
salt_buckets=2,
compression='SNAPPY';
create index table_b_index on table_b (id, depth, id0, id1)
compression='SNAPPY';
{noformat}
The query is as follows:
{noformat}
select a.*, b.c
from table_a a inner join table_b b on (a.depth = b.depth and a.id0 = b.id0 and
a.id1 = b.id1 and a.id2 = b.id)
where a.time_id = 23796900 and a.depth = 1;
{noformat}
It is obvious that b.depth must also be 1 since it's on the join condition. And
since the depth column is the first column of table_b, table_b should be range
scanned before join.
But the query explanation is as follows:
{noformat}
CLIENT PARALLEL 4-WAY RANGE SCAN OVER TABLE_A [0,23796900,1]
CLIENT MERGE SORT
PARALLEL INNER-JOIN TABLE 0
CLIENT PARALLEL 2-WAY FULL SCAN OVER TABLE_B
CLIENT MERGE SORT
{noformat}
But when (b.depth = 1) condition is explicitly added to the query, the
explanation is changed as the expected one:
{noformat}
CLIENT PARALLEL 4-WAY RANGE SCAN OVER TABLE_A [0,23796900,1]
CLIENT MERGE SORT
PARALLEL INNER-JOIN TABLE 0
CLIENT PARALLEL 2-WAY RANGE SCAN OVER TABLE_B [0,1]
CLIENT MERGE SORT
{noformat}
It would be nice if the optimizer could find this condition dependency and
apply it to the query plan.
was:
Joined tables are as follows:
{noformat}
create table table_a
(
time_id integer not null,
depth tinyint not null,
id0 integer not null,
id1 integer not null,
id2 integer not null,
id3 integer not null,
id integer not null,
record_type smallint not null,
c varbinary
constraint pk primary key(time_id, depth, id0, id1, id2, id3, id,
record_type)
)
salt_buckets=4,
compression='SNAPPY',
create table table_b
(
depth tinyint not null,
id0 integer not null,
id1 integer not null,
id integer not null,
c varbinary,
p varbinary
constraint pk primary key(depth, id0, id1, id)
)
salt_buckets=2,
compression='SNAPPY';
create index table_b_index on table_b (id, depth, id0, id1)
compression='SNAPPY';
{noformat}
The query is as follows:
{noformat}
select a.*, b.c
from table_a a inner join table_b b on (a.depth = b.depth and a.id0 = b.id0 and
a.id1 = b.id1 and a.id2 = b.id)
where a.time_id = 23796900 and a.depth = 1;
{noformat}
It is obvious that b.depth must also 1 since it's on the join condition. And
since the depth column is the first column of table_b, table_b should be range
scanned before join.
But the query explanation is as follows:
{noformat}
CLIENT PARALLEL 4-WAY RANGE SCAN OVER TABLE_A [0,23796900,1]
CLIENT MERGE SORT
PARALLEL INNER-JOIN TABLE 0
CLIENT PARALLEL 2-WAY FULL SCAN OVER TABLE_B
CLIENT MERGE SORT
{noformat}
But when (b.depth = 1) condition is explicitly added to the query, the
explanation is changed as the expected one:
{noformat}
CLIENT PARALLEL 4-WAY RANGE SCAN OVER TABLE_A [0,23796900,1]
CLIENT MERGE SORT
PARALLEL INNER-JOIN TABLE 0
CLIENT PARALLEL 2-WAY RANGE SCAN OVER TABLE_B [0,1]
CLIENT MERGE SORT
{noformat}
It would be nice if the optimizer could find this condition dependency and
apply it to the query plan.
> Join optimization: Apply one table's where condition to the others
> ------------------------------------------------------------------
>
> Key: PHOENIX-1997
> URL: https://issues.apache.org/jira/browse/PHOENIX-1997
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.3.1
> Reporter: Taeyun Kim
> Priority: Minor
>
> Joined tables are as follows:
> {noformat}
> create table table_a
> (
> time_id integer not null,
> depth tinyint not null,
> id0 integer not null,
> id1 integer not null,
> id2 integer not null,
> id3 integer not null,
> id integer not null,
> record_type smallint not null,
> c varbinary
> constraint pk primary key(time_id, depth, id0, id1, id2, id3, id,
> record_type)
> )
> salt_buckets=4,
> compression='SNAPPY',
> create table table_b
> (
> depth tinyint not null,
> id0 integer not null,
> id1 integer not null,
> id integer not null,
> c varbinary,
> p varbinary
> constraint pk primary key(depth, id0, id1, id)
> )
> salt_buckets=2,
> compression='SNAPPY';
> create index table_b_index on table_b (id, depth, id0, id1)
> compression='SNAPPY';
> {noformat}
> The query is as follows:
> {noformat}
> select a.*, b.c
> from table_a a inner join table_b b on (a.depth = b.depth and a.id0 = b.id0
> and a.id1 = b.id1 and a.id2 = b.id)
> where a.time_id = 23796900 and a.depth = 1;
> {noformat}
> It is obvious that b.depth must also be 1 since it's on the join condition.
> And since the depth column is the first column of table_b, table_b should be
> range scanned before join.
> But the query explanation is as follows:
> {noformat}
> CLIENT PARALLEL 4-WAY RANGE SCAN OVER TABLE_A [0,23796900,1]
> CLIENT MERGE SORT
> PARALLEL INNER-JOIN TABLE 0
> CLIENT PARALLEL 2-WAY FULL SCAN OVER TABLE_B
> CLIENT MERGE SORT
> {noformat}
> But when (b.depth = 1) condition is explicitly added to the query, the
> explanation is changed as the expected one:
> {noformat}
> CLIENT PARALLEL 4-WAY RANGE SCAN OVER TABLE_A [0,23796900,1]
> CLIENT MERGE SORT
> PARALLEL INNER-JOIN TABLE 0
> CLIENT PARALLEL 2-WAY RANGE SCAN OVER TABLE_B [0,1]
> CLIENT MERGE SORT
> {noformat}
> It would be nice if the optimizer could find this condition dependency and
> apply it to the query plan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)