[jira] [Updated] (PHOENIX-1997) Join optimization: Apply one table's where condition to the others

Taeyun Kim (JIRA) Wed, 20 May 2015 22:10:54 -0700

     [ 
https://issues.apache.org/jira/browse/PHOENIX-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Taeyun Kim updated PHOENIX-1997:
--------------------------------
    Description: 
Joined tables are as follows:

{noformat}
create table table_a
    (
        time_id integer not null,
        depth tinyint not null,
        id0 integer not null,
        id1 integer not null,
        id2 integer not null,
        id3 integer not null,
        id integer not null,
        record_type smallint not null,
        c varbinary
        constraint pk primary key(time_id, depth, id0, id1, id2, id3, id, 
record_type)
    )
    salt_buckets=4,
    compression='SNAPPY',

create table table_b
    (
        depth tinyint not null,
        id0 integer not null,
        id1 integer not null,
        id integer not null,
        c varbinary,
        p varbinary
        constraint pk primary key(depth, id0, id1, id)
    )
    salt_buckets=2,
    compression='SNAPPY';
create index table_b_index on table_b (id, depth, id0, id1)
    compression='SNAPPY';
{noformat}

The query is as follows:

{noformat}
select a.*, b.c
from table_a a inner join table_b b on (a.depth = b.depth and a.id0 = b.id0 and 
a.id1 = b.id1 and a.id2 = b.id)
where a.time_id = 23796900 and a.depth = 1;
{noformat}

It is obvious that b.depth must also be 1 since it's on the join condition. And 
since the depth column is the first column of table_b, table_b should be range 
scanned before join.

But the query explanation is as follows:

{noformat}
 CLIENT PARALLEL 4-WAY RANGE SCAN OVER TABLE_A [0,23796900,1]
  CLIENT MERGE SORT
      PARALLEL INNER-JOIN TABLE 0
          CLIENT PARALLEL 2-WAY FULL SCAN OVER TABLE_B
          CLIENT MERGE SORT
{noformat}

But when (b.depth = 1) condition is explicitly added to the query, the 
explanation is changed as the expected one:

{noformat}
CLIENT PARALLEL 4-WAY RANGE SCAN OVER TABLE_A [0,23796900,1]
  CLIENT MERGE SORT
      PARALLEL INNER-JOIN TABLE 0
          CLIENT PARALLEL 2-WAY RANGE SCAN OVER TABLE_B [0,1]
          CLIENT MERGE SORT
{noformat}

It would be nice if the optimizer could find this condition dependency and 
apply it to the query plan.

  was:
Joined tables are as follows:

{noformat}
create table table_a
    (
        time_id integer not null,
        depth tinyint not null,
        id0 integer not null,
        id1 integer not null,
        id2 integer not null,
        id3 integer not null,
        id integer not null,
        record_type smallint not null,
        c varbinary
        constraint pk primary key(time_id, depth, id0, id1, id2, id3, id, 
record_type)
    )
    salt_buckets=4,
    compression='SNAPPY',

create table table_b
    (
        depth tinyint not null,
        id0 integer not null,
        id1 integer not null,
        id integer not null,
        c varbinary,
        p varbinary
        constraint pk primary key(depth, id0, id1, id)
    )
    salt_buckets=2,
    compression='SNAPPY';
create index table_b_index on table_b (id, depth, id0, id1)
    compression='SNAPPY';
{noformat}

The query is as follows:

{noformat}
select a.*, b.c
from table_a a inner join table_b b on (a.depth = b.depth and a.id0 = b.id0 and 
a.id1 = b.id1 and a.id2 = b.id)
where a.time_id = 23796900 and a.depth = 1;
{noformat}

It is obvious that b.depth must also 1 since it's on the join condition. And 
since the depth column is the first column of table_b, table_b should be range 
scanned before join.

But the query explanation is as follows:

{noformat}
 CLIENT PARALLEL 4-WAY RANGE SCAN OVER TABLE_A [0,23796900,1]
  CLIENT MERGE SORT
      PARALLEL INNER-JOIN TABLE 0
          CLIENT PARALLEL 2-WAY FULL SCAN OVER TABLE_B
          CLIENT MERGE SORT
{noformat}

But when (b.depth = 1) condition is explicitly added to the query, the 
explanation is changed as the expected one:

{noformat}
CLIENT PARALLEL 4-WAY RANGE SCAN OVER TABLE_A [0,23796900,1]
  CLIENT MERGE SORT
      PARALLEL INNER-JOIN TABLE 0
          CLIENT PARALLEL 2-WAY RANGE SCAN OVER TABLE_B [0,1]
          CLIENT MERGE SORT
{noformat}

It would be nice if the optimizer could find this condition dependency and 
apply it to the query plan.


> Join optimization: Apply one table's where condition to the others
> ------------------------------------------------------------------
>
>                 Key: PHOENIX-1997
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1997
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.3.1
>            Reporter: Taeyun Kim
>            Priority: Minor
>
> Joined tables are as follows:
> {noformat}
> create table table_a
>     (
>         time_id integer not null,
>         depth tinyint not null,
>         id0 integer not null,
>         id1 integer not null,
>         id2 integer not null,
>         id3 integer not null,
>         id integer not null,
>         record_type smallint not null,
>         c varbinary
>         constraint pk primary key(time_id, depth, id0, id1, id2, id3, id, 
> record_type)
>     )
>     salt_buckets=4,
>     compression='SNAPPY',
> create table table_b
>     (
>         depth tinyint not null,
>         id0 integer not null,
>         id1 integer not null,
>         id integer not null,
>         c varbinary,
>         p varbinary
>         constraint pk primary key(depth, id0, id1, id)
>     )
>     salt_buckets=2,
>     compression='SNAPPY';
> create index table_b_index on table_b (id, depth, id0, id1)
>     compression='SNAPPY';
> {noformat}
> The query is as follows:
> {noformat}
> select a.*, b.c
> from table_a a inner join table_b b on (a.depth = b.depth and a.id0 = b.id0 
> and a.id1 = b.id1 and a.id2 = b.id)
> where a.time_id = 23796900 and a.depth = 1;
> {noformat}
> It is obvious that b.depth must also be 1 since it's on the join condition. 
> And since the depth column is the first column of table_b, table_b should be 
> range scanned before join.
> But the query explanation is as follows:
> {noformat}
>  CLIENT PARALLEL 4-WAY RANGE SCAN OVER TABLE_A [0,23796900,1]
>   CLIENT MERGE SORT
>       PARALLEL INNER-JOIN TABLE 0
>           CLIENT PARALLEL 2-WAY FULL SCAN OVER TABLE_B
>           CLIENT MERGE SORT
> {noformat}
> But when (b.depth = 1) condition is explicitly added to the query, the 
> explanation is changed as the expected one:
> {noformat}
> CLIENT PARALLEL 4-WAY RANGE SCAN OVER TABLE_A [0,23796900,1]
>   CLIENT MERGE SORT
>       PARALLEL INNER-JOIN TABLE 0
>           CLIENT PARALLEL 2-WAY RANGE SCAN OVER TABLE_B [0,1]
>           CLIENT MERGE SORT
> {noformat}
> It would be nice if the optimizer could find this condition dependency and 
> apply it to the query plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PHOENIX-1997) Join optimization: Apply one table's where condition to the others

Reply via email to