[ 
https://issues.apache.org/jira/browse/HIVE-21950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Murshid Chalaev updated HIVE-21950:
-----------------------------------
    Description: 
Queries with SUBSTR function in EXISTS clause have much more complicated 
execution plan in Hive-2.3 with enabled CBO then with disabled. A query below 
has 8 stages which submit 4 MR jobs in Hive-2.3 when CBO is enabled, while with 
disabled it has 4 stages and submits 1 MR job.

 *STEPS TO REPRODUCE:*
{code:java}
CREATE TABLE i1122 (id STRING);
INSERT INTO i1122 VALUES (1),(1001); 

EXPLAIN
SELECT *
FROM i1122 AS t1
WHERE EXISTS (
    SELECT 1
    FROM i1122 AS t2
    WHERE t2.id = substr(t1.id,4)
);{code}

*ACTUAL RESULT:*
Explain plan in Hive-2.3 with disabled CBO(The same execution plan was in 
Hive-1.2 with enabled CBO):
{code:java}
hive> SET hive.cbo.enable=false;
hive> EXPLAIN
    > SELECT *
    > FROM i1122 AS t1
    > WHERE EXISTS (
    >     SELECT 1
    >     FROM i1122 AS t2
    >     WHERE t2.id = substr(t1.id,4)
    > );
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
...
{code}

Explain plan in Hive-2.3 with enabled CBO:

{code:java}
hive> SET hive.cbo.enable=true;
hive> EXPLAIN
    > SELECT *
    > FROM i1122 AS t1
    > WHERE EXISTS (
    >     SELECT 1
    >     FROM i1122 AS t2
    >     WHERE t2.id = substr(t1.id,4)
    > );
OK
STAGE DEPENDENCIES:
  Stage-3 is a root stage
  Stage-8 depends on stages: Stage-3
  Stage-6 depends on stages: Stage-8
  Stage-5 depends on stages: Stage-6 , consists of Stage-7, Stage-1
  Stage-7 has a backup stage: Stage-1
  Stage-4 depends on stages: Stage-7
  Stage-1
  Stage-0 depends on stages: Stage-4, Stage-1

STAGE PLANS:
...
{code}

Full explain plans you can find in attachment

  was:
Queries with SUBSTR function in EXISTS clause have much more complicated 
execution plan in Hive-2.3 with enabled CBO then in Hive-2.3 with disabled CBO. 
A query below has 8 stages which submit 4 MR jobs in Hive-2.3 with enabled CBO, 
while with disabled CBO it has 4 stages and submits 1 MR job.

 *STEPS TO REPRODUCE:*
{code:java}
CREATE TABLE i1122 (id STRING);
INSERT INTO i1122 VALUES (1),(1001); 

EXPLAIN
SELECT *
FROM i1122 AS t1
WHERE EXISTS (
    SELECT 1
    FROM i1122 AS t2
    WHERE t2.id = substr(t1.id,4)
);{code}

*ACTUAL RESULT:*
Explain plan in Hive-2.3 with disabled CBO(The same execution plan was in 
Hive-1.2 with enabled CBO):
{code:java}
hive> SET hive.cbo.enable=false;
hive> EXPLAIN
    > SELECT *
    > FROM i1122 AS t1
    > WHERE EXISTS (
    >     SELECT 1
    >     FROM i1122 AS t2
    >     WHERE t2.id = substr(t1.id,4)
    > );
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
...
{code}

Explain plan in Hive-2.3 with enabled CBO:

{code:java}
hive> SET hive.cbo.enable=true;
hive> EXPLAIN
    > SELECT *
    > FROM i1122 AS t1
    > WHERE EXISTS (
    >     SELECT 1
    >     FROM i1122 AS t2
    >     WHERE t2.id = substr(t1.id,4)
    > );
OK
STAGE DEPENDENCIES:
  Stage-3 is a root stage
  Stage-8 depends on stages: Stage-3
  Stage-6 depends on stages: Stage-8
  Stage-5 depends on stages: Stage-6 , consists of Stage-7, Stage-1
  Stage-7 has a backup stage: Stage-1
  Stage-4 depends on stages: Stage-7
  Stage-1
  Stage-0 depends on stages: Stage-4, Stage-1

STAGE PLANS:
...
{code}

Full explain plans you can find in attachment


> Optimizer complicates execution plan of queries with SUBSTR function in 
> EXISTS clause
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-21950
>                 URL: https://issues.apache.org/jira/browse/HIVE-21950
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.3.0, 3.1.1
>            Reporter: Murshid Chalaev
>            Priority: Minor
>         Attachments: disabled_CBO.txt, enabled_CBO.txt
>
>
> Queries with SUBSTR function in EXISTS clause have much more complicated 
> execution plan in Hive-2.3 with enabled CBO then with disabled. A query below 
> has 8 stages which submit 4 MR jobs in Hive-2.3 when CBO is enabled, while 
> with disabled it has 4 stages and submits 1 MR job.
>  *STEPS TO REPRODUCE:*
> {code:java}
> CREATE TABLE i1122 (id STRING);
> INSERT INTO i1122 VALUES (1),(1001); 
> EXPLAIN
> SELECT *
> FROM i1122 AS t1
> WHERE EXISTS (
>     SELECT 1
>     FROM i1122 AS t2
>     WHERE t2.id = substr(t1.id,4)
> );{code}
> *ACTUAL RESULT:*
> Explain plan in Hive-2.3 with disabled CBO(The same execution plan was in 
> Hive-1.2 with enabled CBO):
> {code:java}
> hive> SET hive.cbo.enable=false;
> hive> EXPLAIN
>     > SELECT *
>     > FROM i1122 AS t1
>     > WHERE EXISTS (
>     >     SELECT 1
>     >     FROM i1122 AS t2
>     >     WHERE t2.id = substr(t1.id,4)
>     > );
> OK
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-3 depends on stages: Stage-4
>   Stage-0 depends on stages: Stage-3
> STAGE PLANS:
> ...
> {code}
> Explain plan in Hive-2.3 with enabled CBO:
> {code:java}
> hive> SET hive.cbo.enable=true;
> hive> EXPLAIN
>     > SELECT *
>     > FROM i1122 AS t1
>     > WHERE EXISTS (
>     >     SELECT 1
>     >     FROM i1122 AS t2
>     >     WHERE t2.id = substr(t1.id,4)
>     > );
> OK
> STAGE DEPENDENCIES:
>   Stage-3 is a root stage
>   Stage-8 depends on stages: Stage-3
>   Stage-6 depends on stages: Stage-8
>   Stage-5 depends on stages: Stage-6 , consists of Stage-7, Stage-1
>   Stage-7 has a backup stage: Stage-1
>   Stage-4 depends on stages: Stage-7
>   Stage-1
>   Stage-0 depends on stages: Stage-4, Stage-1
> STAGE PLANS:
> ...
> {code}
> Full explain plans you can find in attachment



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to