[
https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670153#comment-16670153
]
Yongzhi Chen commented on HIVE-20304:
-------------------------------------
[~BIGrey], could you attach a new patch to make the pre-commit build pass?
Thanks
> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true,
> and the execution engine is mr, same stage may launch twice due to the wrong
> generated plan
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-20304
> URL: https://issues.apache.org/jira/browse/HIVE-20304
> Project: Hive
> Issue Type: Bug
> Components: CLI
> Affects Versions: 1.2.1, 2.3.3
> Reporter: Hui Huang
> Assignee: Hui Huang
> Priority: Major
> Fix For: 1.2.1
>
> Attachments: HIVE-20304.1.patch, HIVE-20304.patch
>
>
> `When hive.optimize.skewjoin and hive.auto.convert.join are both set to true,
> and the execution engine is set to mr, same stage of a query may launch twice
> due to the wrong generated plan. If hive.exec.parallel is also true, the same
> stage will launch at the same time and the job will failed due to the first
> completed stage clear the map.xml/reduce.xml file stored in the hdfs.
> use following sql to reproduce the issue:
> {code:java}
> CREATE TABLE `tbl1`(
> `fence` string);
> CREATE TABLE `tbl2`(
> `order_id` string,
> `phone` string,
> `search_id` string
> )
> PARTITIONED BY (
> `dt` string);
> CREATE TABLE `tbl3`(
> `order_id` string,
> `platform` string)
> PARTITIONED BY (
> `dt` string);
> CREATE TABLE `tbl4`(
> `groupname` string,
> `phone` string)
> PARTITIONED BY (
> `dt` string);
> CREATE TABLE `tbl5`(
> `search_id` string,
> `fence` string)
> PARTITIONED BY (
> `dt` string);
> SET hive.exec.parallel = TRUE;
> SET hive.auto.convert.join = TRUE;
> SET hive.optimize.skewjoin = TRUE;
> SELECT dt,
> platform,
> groupname,
> count(1) as cnt
> FROM
> (SELECT dt,
> platform,
> groupname
> FROM
> (SELECT fence
> FROM tbl1)ta
> JOIN
> (SELECT a0.dt,
> a1.platform,
> a2.groupname,
> a3.fence
> FROM
> (SELECT dt,
> order_id,
> phone,
> search_id
> FROM tbl2
> WHERE dt =20180703 )a0
> JOIN
> (SELECT order_id,
> platform,
> dt
> FROM tbl3
> WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id
> INNER JOIN
> (SELECT groupname,
> phone,
> dt
> FROM tbl4
> WHERE dt =20180703 )a2 ON a0.phone = a2.phone
> LEFT JOIN
> (SELECT search_id,
> fence,
> dt
> FROM tbl5
> WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON
> ta.fence = t0.fence)t11
> GROUP BY dt,
> platform,
> groupname;
> DROP TABLE tbl1;
> DROP TABLE tbl2;
> DROP TABLE tbl3;
> DROP TABLE tbl4;
> DROP TABLE tbl5;
> {code}
> We will get some error message like this:
> Examining task ID: task_1531284442065_3637_m_000000 (and more) from job
> job_1531284442065_3637
> Task with the most failures(4):
> ----
> Task ID:
> task_1531284442065_3637_m_000000
> URL:
>
> [http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637&tipid=task_1531284442065_3637_m_000000]
> ----
> Diagnostic Messages for this Task:
> File does not exist:
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> java.io.FileNotFoundException: File does not exist:
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> Looking into the plan by executing explain, I found that the Stage-4 and
> Stage-5 can reached from multi root tasks.
> {code:java}
> Explain
> STAGE DEPENDENCIES:
> Stage-21 is a root stage , consists of Stage-34, Stage-5
> Stage-34 has a backup stage: Stage-5
> Stage-20 depends on stages: Stage-34
> Stage-17 depends on stages: Stage-5, Stage-18, Stage-20 , consists of
> Stage-32, Stage-33, Stage-1
> Stage-32 has a backup stage: Stage-1
> Stage-15 depends on stages: Stage-32
> Stage-10 depends on stages: Stage-1, Stage-15, Stage-16 , consists of
> Stage-31, Stage-2
> Stage-31
> Stage-9 depends on stages: Stage-31
> Stage-2 depends on stages: Stage-9
> Stage-33 has a backup stage: Stage-1
> Stage-16 depends on stages: Stage-33
> Stage-1
> Stage-5
> Stage-27 is a root stage , consists of Stage-37, Stage-38, Stage-4
> Stage-37 has a backup stage: Stage-4
> Stage-25 depends on stages: Stage-37
> Stage-12 depends on stages: Stage-4, Stage-22, Stage-23, Stage-25, Stage-26
> , consists of Stage-36, Stage-5
> Stage-36
> Stage-11 depends on stages: Stage-36
> Stage-19 depends on stages: Stage-11 , consists of Stage-35, Stage-5
> Stage-35 has a backup stage: Stage-5
> Stage-18 depends on stages: Stage-35
> Stage-38 has a backup stage: Stage-4
> Stage-26 depends on stages: Stage-38
> Stage-4
> Stage-30 is a root stage , consists of Stage-42, Stage-43, Stage-3
> Stage-42 has a backup stage: Stage-3
> Stage-28 depends on stages: Stage-42
> Stage-14 depends on stages: Stage-3, Stage-28, Stage-29 , consists of
> Stage-41, Stage-4
> Stage-41
> Stage-13 depends on stages: Stage-41
> Stage-24 depends on stages: Stage-13 , consists of Stage-39, Stage-40,
> Stage-4
> Stage-39 has a backup stage: Stage-4
> Stage-22 depends on stages: Stage-39
> Stage-40 has a backup stage: Stage-4
> Stage-23 depends on stages: Stage-40
> Stage-43 has a backup stage: Stage-3
> Stage-29 depends on stages: Stage-43
> Stage-3
> Stage-0 depends on stages: Stage-2
> {code}
> After skewjoin optimization, the processed node is added into the listTasks
> of ConditionalTask and the parentTask of the processed node is removed and
> during the commonJoin optimization of listTasks of ConditionalTask, the new
> generated condTask will be added into root task list due to parentTask is
> null.
> workaround: do not set hive.optimize.skewjoin and hive.auto.convert.join to
> true at the same time.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)